• Open

    DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection. (arXiv:2309.03893v1 [cs.CV])
    Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart.  ( 2 min )
    M3FGM:a node masking and multi-granularity message passing-based federated graph model for spatial-temporal data prediction. (arXiv:2210.16193v3 [cs.LG] UPDATED)
    Researchers are solving the challenges of spatial-temporal prediction by combining Federated Learning (FL) and graph models with respect to the constrain of privacy and security. In order to make better use of the power of graph model, some researchs also combine split learning(SL). However, there are still several issues left unattended: 1) Clients might not be able to access the server during inference phase; 2) The graph of clients designed manually in the server model may not reveal the proper relationship between clients. This paper proposes a new GNN-oriented split federated learning method, named node {\bfseries M}asking and {\bfseries M}ulti-granularity {\bfseries M}essage passing-based Federated Graph Model (M$^3$FGM) for the above issues. For the first issue, the server model of M$^3$FGM employs a MaskNode layer to simulate the case of clients being offline. We also redesign the decoder of the client model using a dual-sub-decoders structure so that each client model can use its local data to predict independently when offline. As for the second issue, a new GNN layer named Multi-Granularity Message Passing (MGMP) layer enables each client node to perceive global and local information. We conducted extensive experiments in two different scenarios on two real traffic datasets. Results show that M$^3$FGM outperforms the baselines and variant models, achieves the best results in both datasets and scenarios.  ( 3 min )
    Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators. (arXiv:2308.15116v2 [cs.LG] UPDATED)
    Molecular dynamics simulations have emerged as a fundamental instrument for studying biomolecules. At the same time, it is desirable to perform simulations of a collection of particles under various conditions in which the molecules can fluctuate. In this paper, we explore and adapt the soft prompt-based learning method to molecular dynamics tasks. Our model can remarkably generalize to unseen and out-of-distribution scenarios with limited training data. While our work focuses on temperature as a test case, the versatility of our approach allows for efficient simulation through any continuous dynamic conditions, such as pressure and volumes. Our framework has two stages: 1) Pre-trains with data mixing technique, augments molecular structure data and temperature prompts, then applies a curriculum learning method by increasing the ratio of them smoothly. 2) Meta-learning-based fine-tuning framework improves sample-efficiency of fine-tuning process and gives the soft prompt-tuning better initialization points. Comprehensive experiments reveal that our framework excels in accuracy for in-domain data and demonstrates strong generalization capabilities for unseen and out-of-distribution samples.  ( 2 min )
    Proper Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimisation Problem. (arXiv:2002.01444v5 [math.OC] UPDATED)
    There has been much recent progress in forecasting the next observation of a linear dynamical system (LDS), which is known as the improper learning, as well as in the estimation of its system matrices, which is known as the proper learning of LDS. We present an approach to proper learning of LDS, which in spite of the non-convexity of the problem, guarantees global convergence of numerical solutions to a least-squares estimator. We present promising computational results.  ( 2 min )
    Blink: Link Local Differential Privacy in Graph Neural Networks via Bayesian Estimation. (arXiv:2309.03190v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have gained an increasing amount of popularity due to their superior capability in learning node embeddings for various graph inference tasks, but training them can raise privacy concerns. To address this, we propose using link local differential privacy over decentralized nodes, enabling collaboration with an untrusted server to train GNNs without revealing the existence of any link. Our approach spends the privacy budget separately on links and degrees of the graph for the server to better denoise the graph topology using Bayesian estimation, alleviating the negative impact of LDP on the accuracy of the trained GNNs. We bound the mean absolute error of the inferred link probabilities against the ground truth graph topology. We then propose two variants of our LDP mechanism complementing each other in different privacy settings, one of which estimates fewer links under lower privacy budgets to avoid false positive link estimates when the uncertainty is high, while the other utilizes more information and performs better given relatively higher privacy budgets. Furthermore, we propose a hybrid variant that combines both strategies and is able to perform better across different privacy budgets. Extensive experiments show that our approach outperforms existing methods in terms of accuracy under varying privacy budgets.  ( 3 min )
    AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning. (arXiv:2308.13280v2 [physics.ao-ph] UPDATED)
    The atmosphere affects humans in a multitude of ways, from loss of life due to adverse weather effects to long-term social and economic impacts on societies. Computer simulations of atmospheric dynamics are, therefore, of great importance for the well-being of our and future generations. Here, we propose AtmoRep, a novel, task-independent stochastic computer model of atmospheric dynamics that can provide skillful results for a wide range of applications. AtmoRep uses large-scale representation learning from artificial intelligence to determine a general description of the highly complex, stochastic dynamics of the atmosphere from the best available estimate of the system's historical trajectory as constrained by observations. This is enabled by a novel self-supervised learning objective and a unique ensemble that samples from the stochastic model with a variability informed by the one in the historical record. The task-independent nature of AtmoRep enables skillful results for a diverse set of applications without specifically training for them and we demonstrate this for nowcasting, temporal interpolation, model correction, and counterfactuals. We also show that AtmoRep can be improved with additional data, for example radar observations, and that it can be extended to tasks such as downscaling. Our work establishes that large-scale neural networks can provide skillful, task-independent models of atmospheric dynamics. With this, they provide a novel means to make the large record of atmospheric observations accessible for applications and for scientific inquiry, complementing existing simulations based on first principles.  ( 3 min )
    Privacy-preserving Continual Federated Clustering via Adaptive Resonance Theory. (arXiv:2309.03487v1 [cs.LG])
    With the increasing importance of data privacy protection, various privacy-preserving machine learning methods have been proposed. In the clustering domain, various algorithms with a federated learning framework (i.e., federated clustering) have been actively studied and showed high clustering performance while preserving data privacy. However, most of the base clusterers (i.e., clustering algorithms) used in existing federated clustering algorithms need to specify the number of clusters in advance. These algorithms, therefore, are unable to deal with data whose distributions are unknown or continually changing. To tackle this problem, this paper proposes a privacy-preserving continual federated clustering algorithm. In the proposed algorithm, an adaptive resonance theory-based clustering algorithm capable of continual learning is used as a base clusterer. Therefore, the proposed algorithm inherits the ability of continual learning. Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to state-of-the-art federated clustering algorithms while realizing data privacy protection and continual learning ability. The source code is available at \url{https://github.com/Masuyama-lab/FCAC}.  ( 2 min )
    Explanation Shift: How Did the Distribution Shift Impact the Model?. (arXiv:2303.08081v2 [cs.LG] UPDATED)
    As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions or model prediction distributions and try to understand issues regarding the interactions between learned models and shifting distributions. We suggest a novel approach that models how explanation characteristics shift when affected by distribution shifts. We find that the modeling of explanation shifts can be a better indicator for detecting out-of-distribution model behaviour than state-of-the-art techniques. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We provide an algorithmic method that allows us to inspect the interaction between data set features and learned models and compare them to the state-of-the-art. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.  ( 2 min )
    Revisiting Hidden Representations in Transfer Learning for Medical Imaging. (arXiv:2302.08272v2 [cs.CV] UPDATED)
    While a key component to the success of deep learning is the availability of massive amounts of training data, medical image datasets are often limited in diversity and size. Transfer learning has the potential to bridge the gap between related yet different domains. For medical applications, however, it remains unclear whether it is more beneficial to pre-train on natural or medical images. We aim to shed light on this problem by comparing initialization on ImageNet and RadImageNet on seven medical classification tasks. Our work includes a replication study, which yields results contrary to previously published findings. In our experiments, ResNet50 models pre-trained on ImageNet tend to outperform those trained on RadImageNet. To gain further insights, we investigate the learned representations using Canonical Correlation Analysis (CCA) and compare the predictions of the different models. Our results indicate that, contrary to intuition, ImageNet and RadImageNet may converge to distinct intermediate representations, which appear to diverge further during fine-tuning. Despite these distinct representations, the predictions of the models remain similar. Our findings show that the similarity between networks before and after fine-tuning does not correlate with performance gains, suggesting that the advantages of transfer learning might not solely originate from the reuse of features in the early layers of a convolutional neural network.
    A Tutorial on the Non-Asymptotic Theory of System Identification. (arXiv:2309.03873v1 [eess.SY])
    This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems.
    Medoid Silhouette clustering with automatic cluster number selection. (arXiv:2309.03751v1 [cs.LG])
    The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.
    Knowledge Distillation Layer that Lets the Student Decide. (arXiv:2309.02843v1 [cs.CV] CROSS LISTED)
    Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework
    On the dynamics of multi agent nonlinear filtering and learning. (arXiv:2309.03557v1 [stat.ML])
    Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.
    Enhancing Pipeline-Based Conversational Agents with Large Language Models. (arXiv:2309.03748v1 [cs.CL])
    The latest advancements in AI and deep learning have led to a breakthrough in large language model (LLM)-based agents such as GPT-4. However, many commercial conversational agent development tools are pipeline-based and have limitations in holding a human-like conversation. This paper investigates the capabilities of LLMs to enhance pipeline-based conversational agents during two phases: 1) in the design and development phase and 2) during operations. In 1) LLMs can aid in generating training data, extracting entities and synonyms, localization, and persona design. In 2) LLMs can assist in contextualization, intent classification to prevent conversational breakdown and handle out-of-scope questions, auto-correcting utterances, rephrasing responses, formulating disambiguation questions, summarization, and enabling closed question-answering capabilities. We conducted informal experiments with GPT-4 in the private banking domain to demonstrate the scenarios above with a practical example. Companies may be hesitant to replace their pipeline-based agents with LLMs entirely due to privacy concerns and the need for deep integration within their existing ecosystems. A hybrid approach in which LLMs' are integrated into the pipeline-based agents allows them to save time and costs of building and running agents by capitalizing on the capabilities of LLMs while retaining the integration and privacy safeguards of their existing systems.
    How to select an objective function using information theory. (arXiv:2212.06566v2 [cs.LG] UPDATED)
    In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the most likely objective function or whichever represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitudes represent how much we should prefer one objective versus another, and the log of their magnitude represents the expected uncertainty of the model.
    Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making. (arXiv:2309.03426v1 [cs.LG])
    Decisions made by machine learning models may have lasting impacts over time, making long-term fairness a crucial consideration. It has been shown that when ignoring the long-term effect, naively imposing fairness criterion in static settings can actually exacerbate bias over time. To explicitly address biases in sequential decision-making, recent works formulate long-term fairness notions in Markov Decision Process (MDP) framework. They define the long-term bias to be the sum of static bias over each time step. However, we demonstrate that naively summing up the step-wise bias can cause a false sense of fairness since it fails to consider the importance difference of different time steps during transition. In this work, we introduce a long-term fairness notion called Equal Long-term Benefit Rate (ELBERT), which explicitly considers varying temporal importance and adapts static fairness principles to the sequential setting. Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient. This makes standard policy optimization methods applicable for reducing the bias, leading to our proposed bias mitigation method ELBERT-PO. Experiments on three sequential decision making environments show that ELBERT-PO significantly reduces bias and maintains high utility. Code is available at https://github.com/Yuancheng-Xu/ELBERT.
    Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization. (arXiv:2309.03824v1 [cs.LG])
    Low Rank Decomposition (LRD) is a model compression technique applied to the weight tensors of deep learning models in order to reduce the number of trainable parameters and computational complexity. However, due to high number of new layers added to the architecture after applying LRD, it may not lead to a high training/inference acceleration if the decomposition ranks are not small enough. The issue is that using small ranks increases the risk of significant accuracy drop after decomposition. In this paper, we propose two techniques for accelerating low rank decomposed models without requiring to use small ranks for decomposition. These methods include rank optimization and sequential freezing of decomposed layers. We perform experiments on both convolutional and transformer-based models. Experiments show that these techniques can improve the model throughput up to 60% during training and 37% during inference when combined together while preserving the accuracy close to that of the original models
    Achieving Occam's Razor: Deep Learning for Optimal Model Reduction. (arXiv:2303.13746v2 [cs.LG] UPDATED)
    All fields of science depend on mathematical models. Occam's razor refers to the principle that good models should exclude parameters beyond those minimally required to describe the systems they represent. This is because redundancy can lead to incorrect estimates of model parameters from data, and thus inaccurate or ambiguous conclusions. Here, we show how deep learning can be powerfully leveraged to address Occam's razor. FixFit, our new method, uses a feedforward deep neural network with a bottleneck layer to characterize and predict the behavior of a given model from its input parameters. FixFit has three major benefits. First, it provides a metric to quantify the original model's degree of complexity. Second, it allows for the unique fitting of data. Third, it provides an unbiased way to discriminate between experimental hypotheses that add value versus those that do not. In two use cases, we demonstrate the broad applicability of this method across scientific domains. To validate the method using a known system, we apply FixFit to recover known composite parameters for the Kepler orbit model. To illustrate how the method can be applied to less well-established fields, we use it to identify parameters for a multi-scale brain model and reduce the search space for viable candidate mechanisms.
    Deep Learning Safety Concerns in Automated Driving Perception. (arXiv:2309.03774v1 [cs.LG])
    Recent advances in the field of deep learning and impressive performance of deep neural networks (DNNs) for perception have resulted in an increased demand for their use in automated driving (AD) systems. The safety of such systems is of utmost importance and thus requires to consider the unique properties of DNNs. In order to achieve safety of AD systems with DNN-based perception components in a systematic and comprehensive approach, so-called safety concerns have been introduced as a suitable structuring element. On the one hand, the concept of safety concerns is -- by design -- well aligned to existing standards relevant for safety of AD systems such as ISO 21448 (SOTIF). On the other hand, it has already inspired several academic publications and upcoming standards on AI safety such as ISO PAS 8800. While the concept of safety concerns has been previously introduced, this paper extends and refines it, leveraging feedback from various domain and safety experts in the field. In particular, this paper introduces an additional categorization for a better understanding as well as enabling cross-functional teams to jointly address the concerns.
    Evaluating the Efficacy of Supervised Learning vs Large Language Models for Identifying Cognitive Distortions and Suicidal Risks in Chinese Social Media. (arXiv:2309.03564v1 [cs.CL])
    Large language models, particularly those akin to the rapidly progressing GPT series, are gaining traction for their expansive influence. While there is keen interest in their applicability within medical domains such as psychology, tangible explorations on real-world data remain scant. Concurrently, users on social media platforms are increasingly vocalizing personal sentiments; under specific thematic umbrellas, these sentiments often manifest as negative emotions, sometimes escalating to suicidal inclinations. Timely discernment of such cognitive distortions and suicidal risks is crucial to effectively intervene and potentially avert dire circumstances. Our study ventured into this realm by experimenting on two pivotal tasks: suicidal risk and cognitive distortion identification on Chinese social media platforms. Using supervised learning as a baseline, we examined and contrasted the efficacy of large language models via three distinct strategies: zero-shot, few-shot, and fine-tuning. Our findings revealed a discernible performance gap between the large language models and traditional supervised learning approaches, primarily attributed to the models' inability to fully grasp subtle categories. Notably, while GPT-4 outperforms its counterparts in multiple scenarios, GPT-3.5 shows significant enhancement in suicide risk classification after fine-tuning. To our knowledge, this investigation stands as the maiden attempt at gauging large language models on Chinese social media tasks. This study underscores the forward-looking and transformative implications of using large language models in the field of psychology. It lays the groundwork for future applications in psychological research and practice.
    Punctate White Matter Lesion Segmentation in Preterm Infants Powered by Counterfactually Generative Learning. (arXiv:2309.03440v1 [eess.IV])
    Accurate segmentation of punctate white matter lesions (PWMLs) are fundamental for the timely diagnosis and treatment of related developmental disorders. Automated PWMLs segmentation from infant brain MR images is challenging, considering that the lesions are typically small and low-contrast, and the number of lesions may dramatically change across subjects. Existing learning-based methods directly apply general network architectures to this challenging task, which may fail to capture detailed positional information of PWMLs, potentially leading to severe under-segmentations. In this paper, we propose to leverage the idea of counterfactual reasoning coupled with the auxiliary task of brain tissue segmentation to learn fine-grained positional and morphological representations of PWMLs for accurate localization and segmentation. A simple and easy-to-implement deep-learning framework (i.e., DeepPWML) is accordingly designed. It combines the lesion counterfactual map with the tissue probability map to train a lightweight PWML segmentation network, demonstrating state-of-the-art performance on a real-clinical dataset of infant T1w MR images. The code is available at \href{https://github.com/ladderlab-xjtu/DeepPWML}{https://github.com/ladderlab-xjtu/DeepPWML}.
    M(otion)-mode Based Prediction of Ejection Fraction using Echocardiograms. (arXiv:2309.03759v1 [eess.IV])
    Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), where lower EF is associated with cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we propose using the M(otion)-mode of echocardiograms for estimating the EF and classifying cardiomyopathy. We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures. Additionally, we extend contrastive learning (CL) to cardiac imaging to learn meaningful representations from exploiting structures in unlabeled data allowing the model to achieve high accuracy, even with limited annotations. Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process and being computationally much more efficient. Furthermore, CL using M-mode images is helpful for limited data scenarios, such as having labels for only 200 patients, which is common in medical applications.
    Conformal Autoregressive Generation: Beam Search with Coverage Guarantees. (arXiv:2309.03797v1 [cs.LG])
    We introduce two new extensions to the beam search algorithm based on conformal predictions (CP) to produce sets of sequences with theoretical coverage guarantees. The first method is very simple and proposes dynamically-sized subsets of beam search results but, unlike typical CP procedures, has an upper bound on the achievable guarantee depending on a post-hoc calibration measure. Our second algorithm introduces the conformal set prediction procedure as part of the decoding process, producing a variable beam width which adapts to the current uncertainty. While more complex, this procedure can achieve coverage guarantees selected a priori. We provide marginal coverage bounds for each method, and evaluate them empirically on a selection of tasks drawing from natural language processing and chemistry.
    REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation. (arXiv:2309.03322v1 [cs.LG])
    Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)
    Natural Example-Based Explainability: a Survey. (arXiv:2309.03234v1 [cs.AI])
    Explainable Artificial Intelligence (XAI) has become increasingly significant for improving the interpretability and trustworthiness of machine learning models. While saliency maps have stolen the show for the last few years in the XAI field, their ability to reflect models' internal processes has been questioned. Although less in the spotlight, example-based XAI methods have continued to improve. It encompasses methods that use examples as explanations for a machine learning model's predictions. This aligns with the psychological mechanisms of human reasoning and makes example-based explanations natural and intuitive for users to understand. Indeed, humans learn and reason by forming mental representations of concepts based on examples. This paper provides an overview of the state-of-the-art in natural example-based XAI, describing the pros and cons of each approach. A "natural" example simply means that it is directly drawn from the training data without involving any generative process. The exclusion of methods that require generating examples is justified by the need for plausibility which is in some regards required to gain a user's trust. Consequently, this paper will explore the following family of methods: similar examples, counterfactual and semi-factual, influential instances, prototypes, and concepts. In particular, it will compare their semantic definition, their cognitive impact, and added values. We hope it will encourage and facilitate future work on natural example-based XAI.
    The Space of Adversarial Strategies. (arXiv:2209.04521v2 [cs.CR] UPDATED)
    Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. Yet, our understanding of this phenomenon stems from a rather fragmented pool of knowledge; at present, there are a handful of attacks, each with disparate assumptions in threat models and incomparable definitions of optimality. In this paper, we propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance. With our new attacks, we measure performance relative to the PEA on: both robust and non-robust models, seven datasets, and three extended lp-based threat models incorporating compute costs, formalizing the Space of Adversarial Strategies. From our evaluation we find that attack performance to be highly contextual: the domain, model robustness, and threat model can have a profound influence on attack efficacy. Our investigation suggests that future studies measuring the security of machine learning should: (1) be contextualized to the domain & threat models, and (2) go beyond the handful of known attacks used today.
    Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd Flow Inference. (arXiv:2309.03239v1 [cs.LG])
    Accurate acquisition of crowd flow at Points of Interest (POIs) is pivotal for effective traffic management, public service, and urban planning. Despite this importance, due to the limitations of urban sensing techniques, the data quality from most sources is inadequate for monitoring crowd flow at each POI. This renders the inference of accurate crowd flow from low-quality data a critical and challenging task. The complexity is heightened by three key factors: 1) \emph{The scarcity and rarity of labeled data}, 2) \emph{The intricate spatio-temporal dependencies among POIs}, and 3) \emph{The myriad correlations between precise crowd flow and GPS reports}. To address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel \underline{C}ontrastive \underline{S}elf-learning framework for \underline{S}patio-\underline{T}emporal data (\model). Our approach initiates with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the \model pre-trained on extensive noisy data consistently outperforms models trained from scratch.
    Let Quantum Neural Networks Choose Their Own Frequencies. (arXiv:2309.03279v1 [quant-ph])
    Parameterized quantum circuits as machine learning models are typically well described by their representation as a partial Fourier series of the input features, with frequencies uniquely determined by the feature map's generator Hamiltonians. Ordinarily, these data-encoding generators are chosen in advance, fixing the space of functions that can be represented. In this work we consider a generalization of quantum models to include a set of trainable parameters in the generator, leading to a trainable frequency (TF) quantum model. We numerically demonstrate how TF models can learn generators with desirable properties for solving the task at hand, including non-regularly spaced frequencies in their spectra and flexible spectral richness. Finally, we showcase the real-world effectiveness of our approach, demonstrating an improved accuracy in solving the Navier-Stokes equations using a TF model with only a single parameter added to each encoding operation. Since TF models encompass conventional fixed frequency models, they may offer a sensible default choice for variational quantum machine learning.
    A comparison of rational and neural network based approximations. (arXiv:2303.04436v2 [math.OC] UPDATED)
    Rational and neural network based approximations are efficient tools in modern approximation. These approaches are able to produce accurate approximations to nonsmooth and non-Lipschitz functions, including multivariate domain functions. In this paper we compare the efficiency of function approximation using rational approximation, neural network and their combinations. It was found that rational approximation is superior to neural network based approaches with the same number of decision variables. Our numerical experiments demonstrate the efficiency of rational approximation, even when the number of approximation parameters (that is, the dimension of the corresponding optimisation problems) is small. Another important contribution of this paper lies in the improvement of rational approximation algorithms. Namely, the optimisation based algorithms for rational approximation can be adjusted to in such a way that the conditioning number of the constraint matrices are controlled. This simple adjustment enables us to work with high dimension optimisation problems and improve the design of the neural network. The main strength of neural networks is in their ability to handle models with a large number of variables: complex models are decomposed in several simple optimisation problems. Therefore the the large number of decision variables is in the nature of neural networks.
    Fitness Approximation through Machine Learning. (arXiv:2309.03318v1 [cs.NE])
    We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, focusing on evolutionary agents in Gymnasium (game) simulators -- where fitness computation is costly. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update throughout an evolutionary run a fitness-approximation ML model. We compare different methods for: 1) switching between actual and approximate fitness, 2) sampling the population, and 3) weighting the samples. Experimental findings demonstrate significant improvement in evolutionary runtimes, with fitness scores that are either identical or slightly lower than that of the fully run GA -- depending on the ratio of approximate-to-actual-fitness computation. Our approach is generic and can be easily applied to many different domains.
    Dynamic Causal Graph Convolutional Network for Traffic Prediction. (arXiv:2306.07019v2 [cs.LG] UPDATED)
    Modeling complex spatiotemporal dependencies in correlated traffic series is essential for traffic prediction. While recent works have shown improved prediction performance by using neural networks to extract spatiotemporal correlations, their effectiveness depends on the quality of the graph structures used to represent the spatial topology of the traffic network. In this work, we propose a novel approach for traffic prediction that embeds time-varying dynamic Bayesian network to capture the fine spatiotemporal topology of traffic data. We then use graph convolutional networks to generate traffic forecasts. To enable our method to efficiently model nonlinear traffic propagation patterns, we develop a deep learning-based module as a hyper-network to generate stepwise dynamic causal graphs. Our experimental results on a real traffic dataset demonstrate the superior prediction performance of the proposed method. The code is available at https://github.com/MonBG/DCGCN.
    Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?. (arXiv:2309.01108v2 [eess.AS] UPDATED)
    $ $Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic space to the articulatory space. Signal-processing features like the MFCCs, have been widely used for the AAI task. For subjects with dysarthric speech, AAI is challenging because of an imprecise and indistinct pronunciation. In this work, we perform AAI for dysarthric speech using representations from pre-trained self-supervised learning (SSL) models. We demonstrate the impact of different pre-trained features on this challenging AAI task, at low-resource conditions. In addition, we also condition x-vectors to the extracted SSL features to train a BLSTM network. In the seen case, we experiment with three AAI training schemes (subject-specific, pooled, and fine-tuned). The results, consistent across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves a relative improvement of the Pearson Correlation Coefficient (CC) by ${\sim}$1.81\% and ${\sim}$4.56\% for healthy controls and patients, respectively, over MFCCs. In the unseen case, we observe similar average trends for different SSL features. Overall, SSL networks like wav2vec, APC, and DeCoAR, which are trained with feature reconstruction or future timestep prediction tasks, perform well in predicting dysarthric articulatory trajectories.
    Evaluating Explanation Methods for Multivariate Time Series Classification. (arXiv:2308.15223v2 [cs.LG] UPDATED)
    Multivariate time series classification is an important computational task arising in applications where data is recorded over time and over multiple channels. For example, a smartwatch can record the acceleration and orientation of a person's motion, and these signals are recorded as multivariate time series. We can classify this data to understand and predict human movement and various properties such as fitness levels. In many applications classification alone is not enough, we often need to classify but also understand what the model learns (e.g., why was a prediction given, based on what information in the data). The main focus of this paper is on analysing and evaluating explanation methods tailored to Multivariate Time Series Classification (MTSC). We focus on saliency-based explanation methods that can point out the most relevant channels and time series points for the classification decision. We analyse two popular and accurate multivariate time series classifiers, ROCKET and dResNet, as well as two popular explanation methods, SHAP and dCAM. We study these methods on 3 synthetic datasets and 2 real-world datasets and provide a quantitative and qualitative analysis of the explanations provided. We find that flattening the multivariate datasets by concatenating the channels works as well as using multivariate classifiers directly and adaptations of SHAP for MTSC work quite well. Additionally, we also find that the popular synthetic datasets we used are not suitable for time series analysis.
    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders. (arXiv:2202.09671v4 [stat.ML] UPDATED)
    Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations.
    RatGPT: Turning online LLMs into Proxies for Malware Attacks. (arXiv:2308.09183v2 [cs.CR] UPDATED)
    The evolution of Generative AI and the capabilities of the newly released Large Language Models (LLMs) open new opportunities in software engineering. However, they also lead to new challenges in cybersecurity. Recently, researchers have shown the possibilities of using LLMs such as ChatGPT to generate malicious content that can directly be exploited or guide inexperienced hackers to weaponize tools and code. These studies covered scenarios that still require the attacker to be in the middle of the loop. In this study, we leverage openly available plugins and use an LLM as proxy between the attacker and the victim. We deliver a proof-of-concept where ChatGPT is used for the dissemination of malicious software while evading detection, alongside establishing the communication to a command and control (C2) server to receive commands to interact with a victim's system. Finally, we present the general approach as well as essential elements in order to stay undetected and make the attack a success. This proof-of-concept highlights significant cybersecurity issues with openly available plugins and LLMs, which require the development of security guidelines, controls, and mitigation strategies.
    Short-Term Load Forecasting Using A Particle-Swarm Optimized Multi-Head Attention-Augmented CNN-LSTM Network. (arXiv:2309.03694v1 [cs.LG])
    Short-term load forecasting is of paramount importance in the efficient operation and planning of power systems, given its inherent non-linear and dynamic nature. Recent strides in deep learning have shown promise in addressing this challenge. However, these methods often grapple with hyperparameter sensitivity, opaqueness in interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that surmounts these obstacles. Our approach harnesses the power of the Particle-Swarm Optimization algorithm to autonomously explore and optimize hyperparameters, a Multi-Head Attention mechanism to discern the salient features crucial for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results underscore its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 marks a significant advancement over existing state-of-the-art approaches, heralding a new era in short-term load forecasting.
    Polynomial Bounds for Learning Noisy Optical Physical Unclonable Functions and Connections to Learning With Errors. (arXiv:2308.09199v2 [cs.LG] UPDATED)
    It is shown that a class of optical physical unclonable functions (PUFs) can be learned to arbitrary precision with arbitrarily high probability, even in the presence of noise, given access to polynomially many challenge-response pairs and polynomially bounded computational power, under mild assumptions about the distributions of the noise and challenge vectors. This extends the results of Rh\"uramir et al. (2013), who showed a subset of this class of PUFs to be learnable in polynomial time in the absence of noise, under the assumption that the optics of the PUF were either linear or had negligible nonlinear effects. We derive polynomial bounds for the required number of samples and the computational complexity of a linear regression algorithm, based on size parameters of the PUF, the distributions of the challenge and noise vectors, and the probability and accuracy of the regression algorithm, with a similar analysis to one done by Bootle et al. (2018), who demonstrated a learning attack on a poorly implemented version of the Learning With Errors problem.
    Impression-Informed Multi-Behavior Recommender System: A Hierarchical Graph Attention Approach. (arXiv:2309.03169v2 [cs.IR] UPDATED)
    While recommender systems have significantly benefited from implicit feedback, they have often missed the nuances of multi-behavior interactions between users and items. Historically, these systems either amalgamated all behaviors, such as \textit{impression} (formerly \textit{view}), \textit{add-to-cart}, and \textit{buy}, under a singular 'interaction' label, or prioritized only the target behavior, often the \textit{buy} action, discarding valuable auxiliary signals. Although recent advancements tried addressing this simplification, they primarily gravitated towards optimizing the target behavior alone, battling with data scarcity. Additionally, they tended to bypass the nuanced hierarchy intrinsic to behaviors. To bridge these gaps, we introduce the \textbf{H}ierarchical \textbf{M}ulti-behavior \textbf{G}raph Attention \textbf{N}etwork (HMGN). This pioneering framework leverages attention mechanisms to discern information from both inter and intra-behaviors while employing a multi-task Hierarchical Bayesian Personalized Ranking (HBPR) for optimization. Recognizing the need for scalability, our approach integrates a specialized multi-behavior sub-graph sampling technique. Moreover, the adaptability of HMGN allows for the seamless inclusion of knowledge metadata and time-series data. Empirical results attest to our model's prowess, registering a notable performance boost of up to 64\% in NDCG@100 metrics over conventional graph neural network methods.
    Bridging the Gap Between Target Networks and Functional Regularization. (arXiv:2106.02613v4 [stat.ML] UPDATED)
    Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.
    Empirical Risk Minimization for Losses without Variance. (arXiv:2309.03818v1 [stat.ML])
    This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.
    Comparing Sequential Forecasters. (arXiv:2110.00115v5 [stat.ME] UPDATED)
    Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework, in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis -- whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters.
    Domain Adaptation for Efficiently Fine-tuning Vision Transformer with Encrypted Images. (arXiv:2309.02556v2 [cs.CV] UPDATED)
    In recent years, deep neural networks (DNNs) trained with transformed data have been applied to various applications such as privacy-preserving learning, access control, and adversarial defenses. However, the use of transformed data decreases the performance of models. Accordingly, in this paper, we propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT). The proposed domain adaptation method does not cause the accuracy degradation of models, and it is carried out on the basis of the embedding structure of ViT. In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.
    Efficient anti-symmetrization of a neural network layer by taming the sign problem. (arXiv:2205.12250v2 [cs.LG] UPDATED)
    Explicit antisymmetrization of a neural network is a potential candidate for a universal function approximator for generic antisymmetric functions, which are ubiquitous in quantum physics. However, this procedure is a priori factorially costly to implement, making it impractical for large numbers of particles. The strategy also suffers from a sign problem. Namely, due to near-exact cancellation of positive and negative contributions, the magnitude of the antisymmetrized function may be significantly smaller than before anti-symmetrization. We show that the anti-symmetric projection of a two-layer neural network can be evaluated efficiently, opening the door to using a generic antisymmetric layer as a building block in anti-symmetric neural network Ansatzes. This approximation is effective when the sign problem is controlled, and we show that this property depends crucially the choice of activation function under standard Xavier/He initialization methods. As a consequence, using a smooth activation function requires re-scaling of the neural network weights compared to standard initializations.
    Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples. (arXiv:2305.09241v4 [cs.LG] UPDATED)
    Safeguarding data from unauthorized exploitation is vital for privacy and security, especially in recent rampant research in security breach such as adversarial/membership attacks. To this end, \textit{unlearnable examples} (UEs) have been recently proposed as a compelling protection, by adding imperceptible perturbation to data so that models trained on them cannot classify them accurately on original clean distribution. Unfortunately, we find UEs provide a false sense of security, because they cannot stop unauthorized users from utilizing other unprotected data to remove the protection, by turning unlearnable data into learnable again. Motivated by this observation, we formally define a new threat by introducing \textit{learnable unauthorized examples} (LEs) which are UEs with their protection removed. The core of this approach is a novel purification process that projects UEs onto the manifold of LEs. This is realized by a new joint-conditional diffusion model which denoises UEs conditioned on the pixel and perceptual similarity between UEs and LEs. Extensive experiments demonstrate that LE delivers state-of-the-art countering performance against both supervised UEs and unsupervised UEs in various scenarios, which is the first generalizable countermeasure to UEs across supervised learning and unsupervised learning. Our code is available at \url{https://github.com/jiangw-0/LE_JCDP}.
    Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework. (arXiv:2309.02428v2 [cs.LG] UPDATED)
    The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of Helal (2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different domains.
    ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation. (arXiv:2309.03891v1 [cs.RO])
    We present ArtiGrasp, a novel method to synthesize bi-manual hand-object interactions that include grasping and articulation. This task is challenging due to the diversity of the global wrist motions and the precise finger control that are necessary to articulate objects. ArtiGrasp leverages reinforcement learning and physics simulations to train a policy that controls the global and local hand pose. Our framework unifies grasping and articulation within a single policy guided by a single hand pose reference. Moreover, to facilitate the training of the precise finger control required for articulation, we present a learning curriculum with increasing difficulty. It starts with single-hand manipulation of stationary objects and continues with multi-agent training including both hands and non-stationary objects. To evaluate our method, we introduce Dynamic Object Grasping and Articulation, a task that involves bringing an object into a target articulated pose. This task requires grasping, relocation, and articulation. We show our method's efficacy towards this task. We further demonstrate that our method can generate motions with noisy hand-object pose estimates from an off-the-shelf image-based regressor.
    USE-Evaluator: Performance Metrics for Medical Image Segmentation Models with Uncertain, Small or Empty Reference Annotations. (arXiv:2209.13008v4 [eess.IV] UPDATED)
    Performance metrics for medical image segmentation models are used to measure the agreement between the reference annotation and the predicted segmentation. Usually, overlap metrics, such as the Dice, are used as a metric to evaluate the performance of these models in order for results to be comparable. However, there is a mismatch between the distributions of cases and difficulty level of segmentation tasks in public data sets compared to clinical practice. Common metrics fail to measure the impact of this mismatch, especially for clinical data sets that include low signal pathologies, a difficult segmentation task, and uncertain, small, or empty reference annotations. This limitation may result in ineffective research of machine learning practitioners in designing and optimizing models. Dimensions of evaluating clinical value include consideration of the uncertainty of reference annotations, independence from reference annotation volume size, and evaluation of classification of empty reference annotations. We study how uncertain, small, and empty reference annotations influence the value of metrics for medical image segmentation on an in-house data set regardless of the model. We examine metrics behavior on the predictions of a standard deep learning framework in order to identify metrics with clinical value. We compare to a public benchmark data set (BraTS 2019) with a high-signal pathology and certain, larger, and no empty reference annotations. We may show machine learning practitioners, how uncertain, small, or empty reference annotations require a rethinking of the evaluation and optimizing procedures. The evaluation code was released to encourage further analysis of this topic. https://github.com/SophieOstmeier/UncertainSmallEmpty.git
    Learning from Demonstration via Probabilistic Diagrammatic Teaching. (arXiv:2309.03835v1 [cs.RO])
    Learning for Demonstration (LfD) enables robots to acquire new skills by imitating expert demonstrations, allowing users to communicate their instructions in an intuitive manner. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. Kinesthetic teaching requires physical handling of the robot, while teleoperation demands proficiency with additional hardware. This paper introduces an alternative paradigm for LfD called Diagrammatic Teaching. Diagrammatic Teaching aims to teach robots novel skills by prompting the user to sketch out demonstration trajectories on 2D images of the scene, these are then synthesised as a generative model of motion trajectories in 3D task space. Additionally, we present the Ray-tracing Probabilistic Trajectory Learning (RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying probability densities from the 2D sketches, applies ray-tracing to find corresponding regions in 3D Cartesian space, and fits a probabilistic model of motion trajectories to these regions. New motion trajectories, which mimic those sketched by the user, can then be generated from the probabilistic model. We empirically validate our framework both in simulation and on real robots, which include a fixed-base manipulator and a quadruped-mounted manipulator.
    Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation. (arXiv:2309.02685v2 [cs.RO] UPDATED)
    Recent studies have verified that equivariant methods can significantly improve the data efficiency, generalizability, and robustness in robot learning. Meanwhile, denoising diffusion-based generative modeling has recently gained significant attention as a promising approach for robotic manipulation learning from demonstrations with stochastic behaviors. In this paper, we present Diffusion-EDFs, a novel approach that incorporates spatial roto-translation equivariance, i.e., SE(3)-equivariance to diffusion generative modeling. By integrating SE(3)-equivariance into our model architectures, we demonstrate that our proposed method exhibits remarkable data efficiency, requiring only 5 to 10 task demonstrations for effective end-to-end training. Furthermore, our approach showcases superior generalizability compared to previous diffusion-based manipulation methods.
    Sparse Federated Training of Object Detection in the Internet of Vehicles. (arXiv:2309.03569v1 [cs.LG])
    As an essential component part of the Intelligent Transportation System (ITS), the Internet of Vehicles (IoV) plays a vital role in alleviating traffic issues. Object detection is one of the key technologies in the IoV, which has been widely used to provide traffic management services by analyzing timely and sensitive vehicle-related information. However, the current object detection methods are mostly based on centralized deep training, that is, the sensitive data obtained by edge devices need to be uploaded to the server, which raises privacy concerns. To mitigate such privacy leakage, we first propose a federated learning-based framework, where well-trained local models are shared in the central server. However, since edge devices usually have limited computing power, plus a strict requirement of low latency in IoVs, we further propose a sparse training process on edge devices, which can effectively lighten the model, and ensure its training efficiency on edge devices, thereby reducing communication overheads. In addition, due to the diverse computing capabilities and dynamic environment, different sparsity rates are applied to edge devices. To further guarantee the performance, we propose, FedWeg, an improved aggregation scheme based on FedAvg, which is designed by the inverse ratio of sparsity rates. Experiments on the real-life dataset using YOLO show that the proposed scheme can achieve the required object detection rate while saving considerable communication costs.
    OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs. (arXiv:2309.03876v1 [cs.CL])
    Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable ability to generate fitting responses to natural language instructions. However, an open research question concerns the inherent biases of trained models and their responses. For instance, if the data used to tune an LLM is dominantly written by persons with a specific political bias, we might expect generated answers to share this bias. Current research work seeks to de-bias such models, or suppress potentially biased answers. With this demonstration, we take a different view on biases in instruction-tuning: Rather than aiming to suppress them, we aim to make them explicit and transparent. To this end, we present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate. The demo will answer this question using a model fine-tuned on text representing each of the selected biases, allowing side-by-side comparison. To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics. This paper presents OpinionGPT, illustrates how we trained the bias-aware model and showcases the web application (available at https://opiniongpt.informatik.hu-berlin.de).
    ImageBind-LLM: Multi-modality Instruction Tuning. (arXiv:2309.03905v1 [cs.MM])
    We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.
    Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints. (arXiv:2304.06104v2 [cs.LG] UPDATED)
    This paper studies the problem of online performance optimization of constrained closed-loop control systems, where both the objective and the constraints are unknown black-box functions affected by exogenous time-varying contextual disturbances. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution under certain regularity conditions. Furthermore, the algorithm achieves zero time-average constraint violation, ensuring that the average value of the constraint function satisfies the desired constraint. The method is applied to both sampled instances from Gaussian processes and a continuous stirred tank reactor parameter tuning problem; simulation results show that the method simultaneously provides close-to-optimal performance and maintains constraint feasibility on average. This contrasts current state-of-the-art methods, which either suffer from large cumulative regret or severe constraint violations for the case studies presented.
    Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity. (arXiv:2111.06781v3 [cs.LG] UPDATED)
    Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) (also called controlled Markov chains) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions (called Quantized Q-Learning) converges to a limit, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a partially observed Markov decision process (POMDP), (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (iii) finally, near-optimality of finite state model approximations for MDPs with weakly continuous kernels which we show to correspond to the fixed point of the constructed POMDP. Thus, our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.
    GPT Can Solve Mathematical Problems Without a Calculator. (arXiv:2309.03241v1 [cs.LG])
    Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools. This paper aims to challenge this misconception. With sufficient training data, a 2 billion-parameter language model can accurately perform multi-digit arithmetic operations with almost 100% accuracy without data leakage, significantly surpassing GPT-4 (whose multi-digit multiplication accuracy is only 4.3%). We also demonstrate that our MathGLM, fine-tuned from GLM-10B on a dataset with additional multi-step arithmetic operations and math problems described in text, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set.
    Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning. (arXiv:2309.03581v1 [cs.LG])
    Hyperparameter optimization (HPO) is important to leverage the full potential of machine learning (ML). In practice, users are often interested in multi-objective (MO) problems, i.e., optimizing potentially conflicting objectives, like accuracy and energy consumption. To tackle this, the vast majority of MO-ML algorithms return a Pareto front of non-dominated machine learning models to the user. Optimizing the hyperparameters of such algorithms is non-trivial as evaluating a hyperparameter configuration entails evaluating the quality of the resulting Pareto front. In literature, there are known indicators that assess the quality of a Pareto front (e.g., hypervolume, R2) by quantifying different properties (e.g., volume, proximity to a reference point). However, choosing the indicator that leads to the desired Pareto front might be a hard task for a user. In this paper, we propose a human-centered interactive HPO approach tailored towards multi-objective ML leveraging preference learning to extract desiderata from users that guide the optimization. Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator. Concretely, we leverage pairwise comparisons of distinct Pareto fronts to learn such an appropriate quality indicator. Then, we optimize the hyperparameters of the underlying MO-ML algorithm towards this learned indicator using a state-of-the-art HPO approach. In an experimental study targeting the environmental impact of ML, we demonstrate that our approach leads to substantially better Pareto fronts compared to optimizing based on a wrong indicator pre-selected by the user, and performs comparable in the case of an advanced user knowing which indicator to pick.
    Characterizing Lipschitz Stability of GNN for Fairness. (arXiv:2309.03648v1 [cs.LG])
    The Lipschitz bound, a technique from robust statistics, can limit the maximum changes in the output concerning the input, taking into account associated irrelevant biased factors. It is an efficient and provable method for examining the output stability of machine learning models without incurring additional computation costs. Recently, Graph Neural Networks (GNNs), which operate on non-Euclidean data, have gained significant attention. However, no previous research has investigated the GNN Lipschitz bounds to shed light on stabilizing model outputs, especially when working on non-Euclidean data with inherent biases. Given the inherent biases in common graph data used for GNN training, it poses a serious challenge to constraining the GNN output perturbations induced by input biases, thereby safeguarding fairness during training. Recently, despite the Lipschitz constant's use in controlling the stability of Euclideanneural networks, the calculation of the precise Lipschitz constant remains elusive for non-Euclidean neural networks like GNNs, especially within fairness contexts. To narrow this gap, we begin with the general GNNs operating on an attributed graph, and formulate a Lipschitz bound to limit the changes in the output regarding biases associated with the input. Additionally, we theoretically analyze how the Lipschitz constant of a GNN model could constrain the output perturbations induced by biases learned from data for fairness training. We experimentally validate the Lipschitz bound's effectiveness in limiting biases of the model output. Finally, from a training dynamics perspective, we demonstrate why the theoretical Lipschitz bound can effectively guide the GNN training to better trade-off between accuracy and fairness.
    Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for Chronic Disease Prediction. (arXiv:2309.03386v1 [cs.LG])
    Positive-Unlabeled (PU) Learning is a challenge presented by binary classification problems where there is an abundance of unlabeled data along with a small number of positive data instances, which can be used to address chronic disease screening problem. State-of-the-art PU learning methods have resulted in the development of various risk estimators, yet they neglect the differences among distinct populations. To address this issue, we present a novel Positive-Unlabeled Learning Tree (PUtree) algorithm. PUtree is designed to take into account communities such as different age or income brackets, in tasks of chronic disease prediction. We propose a novel approach for binary decision-making, which hierarchically builds community-based PU models and then aggregates their deliverables. Our method can explicate each PU model on the tree for the optimized non-leaf PU node splitting. Furthermore, a mask-recovery data augmentation strategy enables sufficient training of the model in individual communities. Additionally, the proposed approach includes an adversarial PU risk estimator to capture hierarchical PU-relationships, and a model fusion network that integrates data from each tree path, resulting in robust binary classification results. We demonstrate the superior performance of PUtree as well as its variants on two benchmarks and a new diabetes-prediction dataset.
    Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction. (arXiv:2309.03619v1 [cs.SD])
    The choice of the objective function is crucial in emerging high-quality representations from self-supervised learning. This paper investigates how different formulations of the Barlow Twins (BT) objective impact downstream task performance for speech data. We propose Modified Barlow Twins (MBT) with normalized latents to enforce scale-invariance and evaluate on speaker identification, gender recognition and keyword spotting tasks. Our results show MBT improves representation generalization over original BT, especially when fine-tuning with limited target data. This highlights the importance of designing objectives that encourage invariant and transferable representations. Our analysis provides insights into how the BT learning objective can be tailored to produce speech representations that excel when adapted to new downstream tasks. This study is an important step towards developing reusable self-supervised speech representations.
    A Natural Gas Consumption Forecasting System for Continual Learning Scenarios based on Hoeffding Trees with Change Point Detection Mechanism. (arXiv:2309.03720v1 [cs.LG])
    Forecasting natural gas consumption, considering seasonality and trends, is crucial in planning its supply and consumption and optimizing the cost of obtaining it, mainly by industrial entities. However, in times of threats to its supply, it is also a critical element that guarantees the supply of this raw material to meet individual consumers' needs, ensuring society's energy security. This article introduces a novel multistep ahead forecasting of natural gas consumption with change point detection integration for model collection selection with continual learning capabilities using data stream processing. The performance of the forecasting models based on the proposed approach is evaluated in a complex real-world use case of natural gas consumption forecasting. We employed Hoeffding tree predictors as forecasting models and the Pruned Exact Linear Time (PELT) algorithm for the change point detection procedure. The change point detection integration enables selecting a different model collection for successive time frames. Thus, three model collection selection procedures (with and without an error feedback loop) are defined and evaluated for forecasting scenarios with various densities of detected change points. These models were compared with change point agnostic baseline approaches. Our experiments show that fewer change points result in a lower forecasting error regardless of the model collection selection procedure employed. Also, simpler model collection selection procedures omitting forecasting error feedback leads to more robust forecasting models suitable for continual learning tasks.
    Large Language Models as Optimizers. (arXiv:2309.03409v1 [cs.LG])
    Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to prompt optimization where the goal is to find instructions that maximize the task accuracy. With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
    Non-inferiority of Deep Learning Acute Ischemic Stroke Segmentation on Non-Contrast CT Compared to Expert Neuroradiologists. (arXiv:2211.15341v3 [eess.IV] UPDATED)
    To determine if a convolutional neural network (CNN) deep learning model can accurately segment acute ischemic changes on non-contrast CT compared to neuroradiologists. Non-contrast CT (NCCT) examinations from 232 acute ischemic stroke patients who were enrolled in the DEFUSE 3 trial were included in this study. Three experienced neuroradiologists independently segmented hypodensity that reflected the ischemic core on each scan. The neuroradiologist with the most experience (expert A) served as the ground truth for deep learning model training. Two additional neuroradiologists (experts B and C) segmentations were used for data testing. The 232 studies were randomly split into training and test sets. The training set was further randomly divided into 5 folds with training and validation sets. A 3-dimensional CNN architecture was trained and optimized to predict the segmentations of expert A from NCCT. The performance of the model was assessed using a set of volume, overlap, and distance metrics using non-inferiority thresholds of 20%, 3ml, and 3mm. The optimized model trained on expert A was compared to test experts B and C. We used a one-sided Wilcoxon signed-rank test to test for the non-inferiority of the model-expert compared to the inter-expert agreement. The final model performance for the ischemic core segmentation task reached a performance of 0.46+-0.09 Surface Dice at Tolerance 5mm and 0.47+-0.13 Dice when trained on expert A. Compared to the two test neuroradiologists the model-expert agreement was non-inferior to the inter-expert agreement, p < 0.05. The CNN accurately delineates the hypodense ischemic core on NCCT in acute ischemic stroke patients with an accuracy comparable to neuroradiologists.
    Cross-domain Sound Recognition for Efficient Underwater Data Analysis. (arXiv:2309.03451v1 [cs.SD])
    This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualization of the underwater data using the feature vectors of an aerial sound recognition model. This enables us to cluster the data in a two dimensional space and listen to points within these clusters to understand their defining characteristics. This innovative method simplifies the process of selecting candidate labels for further training. In the second part, we train a neural network model using both the selected underwater data and the non-underwater dataset. We conducted a quantitative analysis to measure the precision, recall, and F1 score of our model for recognizing airgun sounds, a common type of underwater sound. The F1 score achieved by our model exceeded 84.3%, demonstrating the effectiveness of our approach in analyzing underwater acoustic data. The methodology presented in this paper holds significant potential to reduce the amount of labor required in underwater data analysis and opens up new possibilities for further research in the field of cross-domain data analysis.
    Chat Failures and Troubles: Reasons and Solutions. (arXiv:2309.03708v1 [cs.RO])
    This paper examines some common problems in Human-Robot Interaction (HRI) causing failures and troubles in Chat. A given use case's design decisions start with the suitable robot, the suitable chatting model, identifying common problems that cause failures, identifying potential solutions, and planning continuous improvement. In conclusion, it is recommended to use a closed-loop control algorithm that guides the use of trained Artificial Intelligence (AI) pre-trained models and provides vocabulary filtering, re-train batched models on new datasets, learn online from data streams, and/or use reinforcement learning models to self-update the trained models and reduce errors.
    Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven Physics Research. (arXiv:2309.03780v1 [hep-ex])
    Subatomic particle track reconstruction (tracking) is a vital task in High-Energy Physics experiments. Tracking is exceptionally computationally challenging and fielded solutions, relying on traditional algorithms, do not scale linearly. Machine Learning (ML) assisted solutions are a promising answer. We argue that a complexity-reduced problem description and the data representing it, will facilitate the solution exploration workflow. We provide the REDuced VIrtual Detector (REDVID) as a complexity-reduced detector model and particle collision event simulator combo. REDVID is intended as a simulation-in-the-loop, to both generate synthetic data efficiently and to simplify the challenge of ML model design. The fully parametric nature of our tool, with regards to system-level configuration, while in contrast to physics-accurate simulations, allows for the generation of simplified data for research and education, at different levels. Resulting from the reduced complexity, we showcase the computational efficiency of REDVID by providing the computational cost figures for a multitude of simulation benchmarks. As a simulation and a generative tool for ML-assisted solution design, REDVID is highly flexible, reusable and open-source. Reference data sets generated with REDVID are publicly available.
    Improved theoretical guarantee for rank aggregation via spectral method. (arXiv:2309.03808v1 [stat.ML])
    Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.
    Uncovering Drift in Textual Data: An Unsupervised Method for Detecting and Mitigating Drift in Machine Learning Models. (arXiv:2309.03831v1 [cs.CL])
    Drift in machine learning refers to the phenomenon where the statistical properties of data or context, in which the model operates, change over time leading to a decrease in its performance. Therefore, maintaining a constant monitoring process for machine learning model performance is crucial in order to proactively prevent any potential performance regression. However, supervised drift detection methods require human annotation and consequently lead to a longer time to detect and mitigate the drift. In our proposed unsupervised drift detection method, we follow a two step process. Our first step involves encoding a sample of production data as the target distribution, and the model training data as the reference distribution. In the second step, we employ a kernel-based statistical test that utilizes the maximum mean discrepancy (MMD) distance metric to compare the reference and target distributions and estimate any potential drift. Our method also identifies the subset of production data that is the root cause of the drift. The models retrained using these identified high drift samples show improved performance on online customer experience quality metrics.
    Cross-Image Context Matters for Bongard Problems. (arXiv:2309.03468v1 [cs.CV])
    Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, existing methods have only reached 66% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not incorporate information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because unlike in few-shot learning tasks concerning object classification, the "key concept" in a typical Bongard problem can only be distinguished using multiple positives and multiple negatives. We explore a variety of simple methods to take this cross-image context into account, and demonstrate substantial gains over prior methods, leading to new state-of-the-art performance on Bongard-LOGO (75.3%) and Bongard-HOI (72.45%) and strong performance on the original Bongard problem set (60.84%).
    A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions. (arXiv:2309.03730v1 [cs.LG])
    In lending, where prices are specific to both customers and products, having a well-functioning personalized pricing policy in place is essential to effective business making. Typically, such a policy must be derived from observational data, which introduces several challenges. While the problem of ``endogeneity'' is prominently studied in the established pricing literature, the problem of selection bias (or, more precisely, bid selection bias) is not. We take a step towards understanding the effects of selection bias by posing pricing as a problem of causal inference. Specifically, we consider the reaction of a customer to price a treatment effect. In our experiments, we simulate varying levels of selection bias on a semi-synthetic dataset on mortgage loan applications in Belgium. We investigate the potential of parametric and nonparametric methods for the identification of individual bid-response functions. Our results illustrate how conventional methods such as logistic regression and neural networks suffer adversely from selection bias. In contrast, we implement state-of-the-art methods from causal machine learning and show their capability to overcome selection bias in pricing data.
    Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and Validation. (arXiv:2309.03447v1 [physics.geo-ph])
    We present a data-driven model for ground-motion synthesis using a Generative Adversarial Neural Operator (GANO) that combines recent advancements in machine learning and open access strong motion data sets to generate three-component acceleration time histories conditioned on moment magnitude ($M$), rupture distance ($R_{rup}$), time-average shear-wave velocity at the top $30m$ ($V_{S30}$), and tectonic environment or style of faulting. We use Neural Operators, a resolution invariant architecture that guarantees that the model training is independent of the data sampling frequency. We first present the conditional ground-motion synthesis algorithm (referred to heretofore as cGM-GANO) and discuss its advantages compared to previous work. Next, we verify the cGM-GANO framework using simulated ground motions generated with the Southern California Earthquake Center (SCEC) Broadband Platform (BBP). We lastly train cGM-GANO on a KiK-net dataset from Japan, showing that the framework can recover the magnitude, distance, and $V_{S30}$ scaling of Fourier amplitude and pseudo-spectral accelerations. We evaluate cGM-GANO through residual analysis with the empirical dataset as well as by comparison with conventional Ground Motion Models (GMMs) for selected ground motion scenarios. Results show that cGM-GANO produces consistent median scaling with the GMMs for the corresponding tectonic environments. The largest misfit is observed at short distances due to the scarcity of training data. With the exception of short distances, the aleatory variability of the response spectral ordinates is also well captured, especially for subduction events due to the adequacy of training data. Applications of the presented framework include generation of risk-targeted ground motions for site-specific engineering applications.
    Filtration Surfaces for Dynamic Graph Classification. (arXiv:2309.03616v1 [cs.LG])
    Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation.
    Testing properties of distributions in the streaming model. (arXiv:2309.03245v1 [cs.DS])
    We study distribution testing in the standard access model and the conditional access model when the memory available to the testing algorithm is bounded. In both scenarios, the samples appear in an online fashion and the goal is to test the properties of distribution using an optimal number of samples subject to a memory constraint on how many samples can be stored at a given time. First, we provide a trade-off between the sample complexity and the space complexity for testing identity when the samples are drawn according to the conditional access oracle. We then show that we can learn a succinct representation of a monotone distribution efficiently with a memory constraint on the number of samples that are stored that is almost optimal. We also show that the algorithm for monotone distributions can be extended to a larger class of decomposable distributions.
    A Robust Negative Learning Approach to Partial Domain Adaptation Using Source Prototypes. (arXiv:2309.03531v1 [cs.CV])
    This work proposes a robust Partial Domain Adaptation (PDA) framework that mitigates the negative transfer problem by incorporating a robust target-supervision strategy. It leverages ensemble learning and includes diverse, complementary label feedback, alleviating the effect of incorrect feedback and promoting pseudo-label refinement. Rather than relying exclusively on first-order moments for distribution alignment, our approach offers explicit objectives to optimize intra-class compactness and inter-class separation with the inferred source prototypes and highly-confident target samples in a domain-invariant fashion. Notably, we ensure source data privacy by eliminating the need to access the source data during the adaptation phase through a priori inference of source prototypes. We conducted a series of comprehensive experiments, including an ablation analysis, covering a range of partial domain adaptation tasks. Comprehensive evaluations on benchmark datasets corroborate our framework's enhanced robustness and generalization, demonstrating its superiority over existing state-of-the-art PDA approaches.
    Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities. (arXiv:2304.01311v3 [cs.HC] UPDATED)
    This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.
    LDMRes-Net: Enabling Efficient Medical Image Segmentation on IoT and Edge Platforms. (arXiv:2306.06145v2 [eess.IV] UPDATED)
    In this study, we propose LDMRes-Net, a lightweight dual-multiscale residual block-based computational neural network tailored for medical image segmentation on IoT and edge platforms. Conventional U-Net-based models face challenges in meeting the speed and efficiency demands of real-time clinical applications, such as disease monitoring, radiation therapy, and image-guided surgery. LDMRes-Net overcomes these limitations with its remarkably low number of learnable parameters (0.072M), making it highly suitable for resource-constrained devices. The model's key innovation lies in its dual multi-residual block architecture, which enables the extraction of refined features on multiple scales, enhancing overall segmentation performance. To further optimize efficiency, the number of filters is carefully selected to prevent overlap, reduce training time, and improve computational efficiency. The study includes comprehensive evaluations, focusing on segmentation of the retinal image of vessels and hard exudates crucial for the diagnosis and treatment of ophthalmology. The results demonstrate the robustness, generalizability, and high segmentation accuracy of LDMRes-Net, positioning it as an efficient tool for accurate and rapid medical image segmentation in diverse clinical applications, particularly on IoT and edge platforms. Such advances hold significant promise for improving healthcare outcomes and enabling real-time medical image analysis in resource-limited settings.
    Efficient Single Object Detection on Image Patches with Early Exit Enhanced High-Precision CNNs. (arXiv:2309.03530v1 [cs.CV])
    This paper proposes a novel approach for detecting objects using mobile robots in the context of the RoboCup Standard Platform League, with a primary focus on detecting the ball. The challenge lies in detecting a dynamic object in varying lighting conditions and blurred images caused by fast movements. To address this challenge, the paper presents a convolutional neural network architecture designed specifically for computationally constrained robotic platforms. The proposed CNN is trained to achieve high precision classification of single objects in image patches and to determine their precise spatial positions. The paper further integrates Early Exits into the existing high-precision CNN architecture to reduce the computational cost of easily rejectable cases in the background class. The training process involves a composite loss function based on confidence and positional losses with dynamic weighting and data augmentation. The proposed approach achieves a precision of 100% on the validation dataset and a recall of almost 87%, while maintaining an execution time of around 170 $\mu$s per hypotheses. By combining the proposed approach with an Early Exit, a runtime optimization of more than 28%, on average, can be achieved compared to the original CNN. Overall, this paper provides an efficient solution for an enhanced detection of objects, especially the ball, in computationally constrained robotic platforms.
    Source Camera Identification and Detection in Digital Videos through Blind Forensics. (arXiv:2309.03353v1 [cs.CV])
    Source camera identification in digital videos is the problem of associating an unknown digital video with its source device, within a closed set of possible devices. The existing techniques in source detection of digital videos try to find a fingerprint of the actual source in the video in form of PRNU (Photo Response Non--Uniformity), and match it against the SPN (Sensor Pattern Noise) of each possible device. The highest correlation indicates the correct source. We investigate the problem of identifying a video source through a feature based approach using machine learning. In this paper, we present a blind forensic technique of video source authentication and identification, based on feature extraction, feature selection and subsequent source classification. The main aim is to determine whether a claimed source for a video is actually its original source. If not, we identify its original source. Our experimental results prove the efficiency of the proposed method compared to traditional fingerprint based technique.
    Byzantine-Robust Federated Learning with Variance Reduction and Differential Privacy. (arXiv:2309.03437v1 [cs.LG])
    Federated learning (FL) is designed to preserve data privacy during model training, where the data remains on the client side (i.e., IoT devices), and only model updates of clients are shared iteratively for collaborative learning. However, this process is vulnerable to privacy attacks and Byzantine attacks: the local model updates shared throughout the FL network will leak private information about the local training data, and they can also be maliciously crafted by Byzantine attackers to disturb the learning. In this paper, we propose a new FL scheme that guarantees rigorous privacy and simultaneously enhances system robustness against Byzantine attacks. Our approach introduces sparsification- and momentum-driven variance reduction into the client-level differential privacy (DP) mechanism, to defend against Byzantine attackers. The security design does not violate the privacy guarantee of the client-level DP mechanism; hence, our approach achieves the same client-level DP guarantee as the state-of-the-art. We conduct extensive experiments on both IID and non-IID datasets and different tasks and evaluate the performance of our approach against different Byzantine attacks by comparing it with state-of-the-art defense methods. The results of our experiments show the efficacy of our framework and demonstrate its ability to improve system robustness against Byzantine attacks while achieving a strong privacy guarantee.
    Automated Bioinformatics Analysis via AutoBA. (arXiv:2309.03242v1 [q-bio.GN])
    With the fast-growing and evolving omics data, the demand for streamlined and adaptable tools to handle the analysis continues to grow. In response to this need, we introduce Auto Bioinformatics Analysis (AutoBA), an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis. AutoBA simplifies the analytical process by requiring minimal user input while delivering detailed step-by-step plans for various bioinformatics tasks. Through rigorous validation by expert bioinformaticians, AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics. AutoBA's unique capacity to self-design analysis processes based on input data variations further underscores its versatility. Compared with online bioinformatic services, AutoBA deploys the analysis locally, preserving data privacy. Moreover, different from the predefined pipeline, AutoBA has adaptability in sync with emerging bioinformatics tools. Overall, AutoBA represents a convenient tool, offering robustness and adaptability for complex omics data analysis.
    Which algorithm to select in sports timetabling?. (arXiv:2309.03229v1 [cs.AI])
    Any sports competition needs a timetable, specifying when and where teams meet each other. The recent International Timetabling Competition (ITC2021) on sports timetabling showed that, although it is possible to develop general algorithms, the performance of each algorithm varies considerably over the problem instances. This paper provides an instance space analysis for sports timetabling, resulting in powerful insights into the strengths and weaknesses of eight state-of-the-art algorithms. Based on machine learning techniques, we propose an algorithm selection system that predicts which algorithm is likely to perform best when given the characteristics of a sports timetabling problem instance. Furthermore, we identify which characteristics are important in making that prediction, providing insights in the performance of the algorithms, and suggestions to further improve them. Finally, we assess the empirical hardness of the instances. Our results are based on large computational experiments involving about 50 years of CPU time on more than 500 newly generated problem instances.
    Quantum-AI empowered Intelligent Surveillance: Advancing Public Safety Through Innovative Contraband Detection. (arXiv:2309.03231v1 [quant-ph])
    Surveillance systems have emerged as crucial elements in upholding peace and security in the modern world. Their ubiquity aids in monitoring suspicious activities effectively. However, in densely populated environments, continuous active monitoring becomes impractical, necessitating the development of intelligent surveillance systems. AI integration in the surveillance domain was a big revolution, however, speed issues have prevented its widespread implementation in the field. It has been observed that quantum artificial intelligence has led to a great breakthrough. Quantum artificial intelligence-based surveillance systems have shown to be more accurate as well as capable of performing well in real-time scenarios, which had never been seen before. In this research, a RentinaNet model is integrated with Quantum CNN and termed as Quantum-RetinaNet. By harnessing the Quantum capabilities of QCNN, Quantum-RetinaNet strikes a balance between accuracy and speed. This innovative integration positions it as a game-changer, addressing the challenges of active monitoring in densely populated scenarios. As demand for efficient surveillance solutions continues to grow, Quantum-RetinaNet offers a compelling alternative to existing CNN models, upholding accuracy standards without sacrificing real-time performance. The unique attributes of Quantum-RetinaNet have far-reaching implications for the future of intelligent surveillance. With its enhanced processing speed, it is poised to revolutionize the field, catering to the pressing need for rapid yet precise monitoring. As Quantum-RetinaNet becomes the new standard, it ensures public safety and security while pushing the boundaries of AI in surveillance.
    Relay Diffusion: Unifying diffusion process across resolutions for image synthesis. (arXiv:2309.03350v1 [cs.CV])
    Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation. Through the lens of discrete cosine transformation, we find the main reason is that \emph{the same noise level on a higher resolution results in a higher Signal-to-Noise Ratio in the frequency domain}. In this work, we present Relay Diffusion Model (RDM), which transfers a low-resolution image or noise into an equivalent high-resolution one for diffusion model via blurring diffusion and block noise. Therefore, the diffusion process can continue seamlessly in any new resolution or model without restarting from pure noise or low-resolution conditioning. RDM achieves state-of-the-art FID on CelebA-HQ and sFID on ImageNet 256$\times$256, surpassing previous works such as ADM, LDM and DiT by a large margin. All the codes and checkpoints are open-sourced at \url{https://github.com/THUDM/RelayDiffusion}.
    Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data. (arXiv:2309.03439v1 [cs.LG])
    We propose personalized Tucker decomposition (perTucker) to address the limitations of traditional tensor decomposition methods in capturing heterogeneity across different datasets. perTucker decomposes tensor data into shared global components and personalized local components. We introduce a mode orthogonality assumption and develop a proximal gradient regularized block coordinate descent algorithm that is guaranteed to converge to a stationary point. By learning unique and common representations across datasets, we demonstrate perTucker's effectiveness in anomaly detection, client classification, and clustering through a simulation study and two case studies on solar flare detection and tonnage signal classification.
    ViewMix: Augmentation for Robust Representation in Self-Supervised Learning. (arXiv:2309.03360v1 [cs.CV])
    Joint Embedding Architecture-based self-supervised learning methods have attributed the composition of data augmentations as a crucial factor for their strong representation learning capabilities. While regional dropout strategies have proven to guide models to focus on lesser indicative parts of the objects in supervised methods, it hasn't been adopted by self-supervised methods for generating positive pairs. This is because the regional dropout methods are not suitable for the input sampling process of the self-supervised methodology. Whereas dropping informative pixels from the positive pairs can result in inefficient training, replacing patches of a specific object with a different one can steer the model from maximizing the agreement between different positive pairs. Moreover, joint embedding representation learning methods have not made robustness their primary training outcome. To this end, we propose the ViewMix augmentation policy, specially designed for self-supervised learning, upon generating different views of the same image, patches are cut and pasted from one view to another. By leveraging the different views created by this augmentation strategy, multiple joint embedding-based self-supervised methodologies obtained better localization capability and consistently outperformed their corresponding baseline methods. It is also demonstrated that incorporating ViewMix augmentation policy promotes robustness of the representations in the state-of-the-art methods. Furthermore, our experimentation and analysis of compute times suggest that ViewMix augmentation doesn't introduce any additional overhead compared to other counterparts.
    Towards Comparable Knowledge Distillation in Semantic Image Segmentation. (arXiv:2309.03659v1 [cs.CV])
    Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this problem is the comparison of two publications from 2022. Using the same models and dataset, Structural and Statistical Texture Distillation (SSTKD) reports an increase of student mIoU of 4.54 and a final performance of 29.19, while Adaptive Perspective Distillation (APD) only improves student performance by 2.06 percentage points, but achieves a final performance of 39.25. The reason for such extreme differences is often a suboptimal choice of hyperparameters and a resulting underperformance of the student model used as reference point. In our work, we reveal problems of insufficient hyperparameter tuning by showing that distillation improvements of two widely accepted frameworks, SKD and IFVD, vanish when hyperparameters are optimized sufficiently. To improve comparability of future research in the field, we establish a solid baseline for three datasets and two student models and provide extensive information on hyperparameter tuning. We find that only two out of eight techniques can compete with our simple baseline on the ADE20K dataset.
    Multi-Modality Guidance Network For Missing Modality Inference. (arXiv:2309.03452v1 [cs.CV])
    Multimodal models have gained significant success in recent years. Standard multimodal approaches often assume unchanged modalities from training stage to inference stage. In practice, however, many scenarios fail to satisfy such assumptions with missing modalities during inference, leading to limitations on where multimodal models can be applied. While existing methods mitigate the problem through reconstructing the missing modalities, it increases unnecessary computational cost, which could be just as critical, especially for large, deployed systems. To solve the problem from both sides, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models for inference. Real-life experiment in violence detection shows that our proposed framework trains single-modality models that significantly outperform its traditionally trained counterparts while maintaining the same inference cost.
    Robotic Table Tennis: A Case Study into a High Speed Learning System. (arXiv:2309.03315v1 [cs.RO])
    We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.
    No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function. (arXiv:2309.03224v1 [cs.AI])
    Large language models (LLMs) exhibit impressive language understanding and in-context learning abilities including natural language processing (NLP) tasks and challenging mathematical reasoning. However, due to the lack of process-supervision, applying PLMs to mathematical reasoning tasks often fail to generate correct reasoning steps and final answer even though solutions have high probabilities. To unleash the mathematical reasoning of finetuned-LLMs without any further fineutuning steps, we propose a method to endow LLMs with immediate reaction and delicate reasoning system via Monte Carlo Tree Search(MCTS) and a light energy function to rank the decision steps. In particular, We first re-formalize the finetuned-LLMs to a Residual-based Energy Model~(Residual-EBM) and apply noise contrastive estimation to estimate the parameters of energy function . Then we use MCTS with energy function as path verifier to search the output space and evaluating the reasoning path. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our method that improve the pass@1 of the finetuned-model without further finetuning or RLHF alignment by a substantial margin.
    Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat. (arXiv:2309.03237v1 [cs.LG])
    We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks. We consider many issues that have not been adequately considered before: whether learning over data sets that do not have diverse sets of images affects the results; whether to use a pre-trained feature extraction "backbone"; how to evaluate learner performance (we argue that classification accuracy is not enough), among others. Overall, across a wide variety of settings, we find that vertically decomposing a neural network seems to give the best results, and outperforms more standard reconciliation-used methods.
    Using Neural Networks for Fast SAR Roughness Estimation of High Resolution Images. (arXiv:2309.03351v1 [cs.CV])
    The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the $G_I^0$ distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. This leads to the need of quick and reliable estimation of the roughness parameter from SAR data, especially with high resolution images. Unfortunately, traditional parameter estimation procedures are slow and prone to estimation failures. In this work, we proposed a neural network-based estimation framework that first learns how to predict underlying parameters of $G_I^0$ samples and then can be used to estimate the roughness of unseen data. We show that this approach leads to an estimator that is quicker, yields less estimation error and is less prone to failures than the traditional estimation procedures for this problem, even when we use a simple network. More importantly, we show that this same methodology can be generalized to handle image inputs and, even if trained on purely synthetic data for a few seconds, is able to perform real time pixel-wise roughness estimation for high resolution real SAR imagery.
    A Probabilistic Semi-Supervised Approach with Triplet Markov Chains. (arXiv:2309.03707v1 [stat.ML])
    Triplet Markov chains are general generative models for sequential data which take into account three kinds of random variables: (noisy) observations, their associated discrete labels and latent variables which aim at strengthening the distribution of the observations and their associated labels. However, in practice, we do not have at our disposal all the labels associated to the observations to estimate the parameters of such models. In this paper, we propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models in a semi-supervised context. The generality of our approach enables us to derive semi-supervised algorithms for a variety of generative models for sequential Bayesian classification.
    TSGBench: Time Series Generation Benchmark. (arXiv:2309.03755v1 [cs.LG])
    Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized synthetic and private datasets introduces biases and hampers generalizability. (3) Ambiguous evaluation measures, often tied to custom networks or downstream tasks, hinder consistent and fair comparison. To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural TSG Benchmark, designed for a unified and comprehensive assessment of TSG methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; (3) a pioneering generalization test rooted in Domain Adaptation (DA), compatible with all methods. We have conducted extensive experiments across ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures, all gauged through \textsf{TSGBench}. The results highlight its remarkable efficacy and consistency. More importantly, \textsf{TSGBench} delivers a statistical breakdown of method rankings, illuminating performance variations across different datasets and measures, and offering nuanced insights into the effectiveness of each method.
    EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation. (arXiv:2309.03244v1 [eess.IV])
    We introduce EGIC, a novel generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. Specifically, we propose an implicitly encoded variant of image interpolation that predicts the residual between a MSE-optimized and GAN-optimized decoder output. On the receiver side, the user can then control the impact of the residual on the GAN-based reconstruction. Together with improved GAN-based building blocks, EGIC outperforms a wide-variety of perception-oriented and distortion-oriented baselines, including HiFiC, MRIC and DIRAC, while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight (e.g. 0.18x model parameters compared to HiFiC) and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.
    BoXHED2.0: Scalable boosting of dynamic survival analysis. (arXiv:2103.12591v5 [cs.LG] UPDATED)
    Modern applications of survival analysis increasingly involve time-dependent covariates. The Python package BoXHED2.0 is a tree-boosted hazard estimator that is fully nonparametric, and is applicable to survival settings far more general than right-censoring, including recurring events and competing risks. BoXHED2.0 is also scalable to the point of being on the same order of speed as parametric boosted survival models, in part because its core is written in C++ and it also supports the use of GPUs and multicore CPUs. BoXHED2.0 is available from PyPI and also from www.github.com/BoXHED.
    Domain Generalization for Mammographic Image Analysis with Contrastive Learning. (arXiv:2304.10226v5 [cs.CV] UPDATED)
    The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient amount of diverse data for training. To this end, a novel contrastive learning is developed to equip the deep learning models with better style generalization capability. Specifically, the multi-style and multi-view unsupervised self-learning scheme is carried out to seek robust feature embedding against style diversity as a pretrained model. Afterward, the pretrained network is further fine-tuned to the downstream tasks, e.g., mass detection, matching, BI-RADS rating, and breast density classification. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets. The experimental results suggest that the proposed domain generalization method can effectively improve performance of four mammographic image tasks on the data from both seen and unseen domains, and outperform many state-of-the-art (SOTA) generalization methods.
    Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences. (arXiv:2309.03791v1 [cs.LG])
    We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These methods are based on a new class of optimal-transport-regularized divergences, constructed via an infimal convolution between an information divergence and an optimal-transport (OT) cost. We use these as tools to enhance adversarial robustness by maximizing the expected loss over a neighborhood of distributions, a technique known as distributionally robust optimization. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence. We demonstrate the effectiveness of our method on malware detection and image recognition applications and find that, to our knowledge, it outperforms existing methods at enhancing the robustness against adversarial attacks. $ARMOR_D$ yields the robustified accuracy of $98.29\%$ against $FGSM$ and $98.18\%$ against $PGD^{40}$ on the MNIST dataset, reducing the error rate by more than $19.7\%$ and $37.2\%$ respectively compared to prior methods. Similarly, in malware detection, a discrete (binary) data domain, $ARMOR_D$ improves the robustified accuracy under $rFGSM^{50}$ attack compared to the previous best-performing adversarial training methods by $37.0\%$ while lowering false negative and false positive rates by $51.1\%$ and $57.53\%$, respectively.
    Towards Personalized Federated Learning via Heterogeneous Model Reassembly. (arXiv:2308.08643v2 [cs.LG] UPDATED)
    This paper focuses on addressing the practical yet challenging problem of model heterogeneity in federated learning, where clients possess models with different network structures. To track this problem, we propose a novel framework called pFedHR, which leverages heterogeneous model reassembly to achieve personalized federated learning. In particular, we approach the problem of heterogeneous model personalization as a model-matching optimization task on the server side. Moreover, pFedHR automatically and dynamically generates informative and diverse personalized candidates with minimal human intervention. Furthermore, our proposed heterogeneous model reassembly technique mitigates the adverse impact introduced by using public data with different distributions from the client data to a certain extent. Experimental results demonstrate that pFedHR outperforms baselines on three datasets under both IID and Non-IID settings. Additionally, pFedHR effectively reduces the adverse impact of using different public data and dynamically generates diverse personalized models in an automated manner.
    Learning continuous-valued treatment effects through representation balancing. (arXiv:2309.03731v1 [cs.LG])
    Estimating the effects of treatments with an associated dose on an instance's outcome, the "dose response", is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such effects, also known as continuous-valued treatment effects, are typically estimated from observational data, which may be subject to dose selection bias. This means that the allocation of doses depends on pre-treatment covariates. Previous studies have shown that conventional machine learning approaches fail to learn accurate individual estimates of dose responses under the presence of dose selection bias. In this work, we propose CBRNet, a causal machine learning approach to estimate an individual dose response from observational data. CBRNet adopts the Neyman-Rubin potential outcome framework and extends the concept of balanced representation learning for overcoming selection bias to continuous-valued treatments. Our work is the first to apply representation balancing in a continuous-valued treatment setting. We evaluate our method on a newly proposed benchmark. Our experiments demonstrate CBRNet's ability to accurately learn treatment effects under selection bias and competitive performance with respect to other state-of-the-art methods.
    Graph Fairing Convolutional Networks for Anomaly Detection. (arXiv:2010.10274v2 [cs.LG] UPDATED)
    Graph convolution is a fundamental building block for many deep neural networks on graph-structured data. In this paper, we introduce a simple, yet very effective graph convolutional network with skip connections for semi-supervised anomaly detection. The proposed layerwise propagation rule of our model is theoretically motivated by the concept of implicit fairing in geometry processing, and comprises a graph convolution module for aggregating information from immediate node neighbors and a skip connection module for combining layer-wise neighborhood representations. This propagation rule is derived from the iterative solution of the implicit fairing equation via the Jacobi method. In addition to capturing information from distant graph nodes through skip connections between the network's layers, our approach exploits both the graph structure and node features for learning discriminative node representations. These skip connections are integrated by design in our proposed network architecture. The effectiveness of our model is demonstrated through extensive experiments on five benchmark datasets, achieving better or comparable anomaly detection results against strong baseline methods. We also demonstrate through an ablation study that skip connection helps improve the model performance.
    Continual Pre-Training of Large Language Models: How to (re)warm your model?. (arXiv:2308.04014v2 [cs.CL] UPDATED)
    Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset.
    Trinary Decision Trees for missing value handling. (arXiv:2309.03561v1 [stat.ML])
    This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.
    Generating quantum feature maps using multi-objective genetic algorithm. (arXiv:2309.03307v1 [quant-ph])
    We present a novel approach for efficiently generating quantum feature maps for quantum-enhanced support vector machines, a kernel-based classifier, enabling access to high-dimensional Hilbert space. Our method employs a multi-objective genetic algorithm that simultaneously maximizes classification accuracy while minimizing both the local and non-local gate costs of the quantum feature map's circuit. To achieve this, we define distinct fitness functions for local gates and entanglement gates. Comparisons with classical classifiers are given in order to understand the advantages of using quantum machine learning. Surprisingly, our experiments reveal that the optimal configuration of quantum circuits for the quantum kernel method incorporates a proportional number of non-local gates for entanglement, contrary to previous literature where non-local gates were largely suppressed. Furthermore, we demonstrate that the separability indexes of data can be effectively leveraged to determine the number of non-local gates required for the quantum support vector machine's feature maps. This insight can significantly aid in selecting appropriate parameters, such as the entanglement parameter, in various quantum programming packages like quiskit.org based on data analysis. Our findings offer valuable guidance for enhancing the efficiency and accuracy of quantum machine learning algorithms.
    RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph Classification. (arXiv:2308.02335v2 [cs.LG] UPDATED)
    Graph classification is a crucial task in many real-world multimedia applications, where graphs can represent various multimedia data types such as images, videos, and social networks. Previous efforts have applied graph neural networks (GNNs) in balanced situations where the class distribution is balanced. However, real-world data typically exhibit long-tailed class distributions, resulting in a bias towards the head classes when using GNNs and limited generalization ability over the tail classes. Recent approaches mainly focus on re-balancing different classes during model training, which fails to explicitly introduce new knowledge and sacrifices the performance of the head classes. To address these drawbacks, we propose a novel framework called Retrieval Augmented Hybrid Network (RAHNet) to jointly learn a robust feature extractor and an unbiased classifier in a decoupled manner. In the feature extractor training stage, we develop a graph retrieval module to search for relevant graphs that directly enrich the intra-class diversity for the tail classes. Moreover, we innovatively optimize a category-centered supervised contrastive loss to obtain discriminative representations, which is more suitable for long-tailed scenarios. In the classifier fine-tuning stage, we balance the classifier weights with two weight regularization techniques, i.e., Max-norm and weight decay. Experiments on various popular benchmarks verify the superiority of the proposed method against state-of-the-art approaches.
    Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models. (arXiv:2309.02976v2 [cs.RO] UPDATED)
    Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: https://sites.google.com/view/naturalwalkingrl
    ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators. (arXiv:2306.08754v3 [cs.LG] UPDATED)
    Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, https://huggingface.co/datasets/LEAP/ClimSim_low-res, and https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.
    Max-Margin Token Selection in Attention Mechanism. (arXiv:2306.13596v3 [cs.LG] UPDATED)
    Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this work, we explore the seminal softmax-attention model $f(\boldsymbol{X})=\langle \boldsymbol{Xv}, \texttt{softmax}(\boldsymbol{XWp})\rangle$, where $\boldsymbol{X}$ is the token sequence and $(\boldsymbol{v},\boldsymbol{W},\boldsymbol{p})$ are trainable parameters. We prove that running gradient descent on $\boldsymbol{p}$, or equivalently $\boldsymbol{W}$, converges in direction to a max-margin solution that separates $\textit{locally-optimal}$ tokens from non-optimal ones. This clearly formalizes attention as an optimal token selection mechanism. Remarkably, our results are applicable to general data and precisely characterize $\textit{optimality}$ of tokens in terms of the value embeddings $\boldsymbol{Xv}$ and problem geometry. We also provide a broader regularization path analysis that establishes the margin maximizing nature of attention even for nonlinear prediction heads. When optimizing $\boldsymbol{v}$ and $\boldsymbol{p}$ simultaneously with logistic loss, we identify conditions under which the regularization paths directionally converge to their respective hard-margin SVM solutions where $\boldsymbol{v}$ separates the input features based on their labels. Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$. Finally, we verify our theoretical findings via numerical experiments and provide insights.
    Gradient-Based Feature Learning under Structured Data. (arXiv:2309.03843v1 [stat.ML])
    Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
    Accelerating Numerical Solvers for Large-Scale Simulation of Dynamical System via NeurVec. (arXiv:2208.03680v2 [cs.CE] UPDATED)
    The large-scale simulation of dynamical systems is critical in numerous scientific and engineering disciplines. However, traditional numerical solvers are limited by the choice of step sizes when estimating integration, resulting in a trade-off between accuracy and computational efficiency. To address this challenge, we introduce a deep learning-based corrector called Neural Vector (NeurVec), which can compensate for integration errors and enable larger time step sizes in simulations. Our extensive experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability on a continuous phase space, even when trained using limited and discrete data. NeurVec significantly accelerates traditional solvers, achieving speeds tens to hundreds of times faster while maintaining high levels of accuracy and stability. Moreover, NeurVec's simple-yet-effective design, combined with its ease of implementation, has the potential to establish a new paradigm for fast-solving differential equations based on deep learning.
    Convergence Analysis of Decentralized ASGD. (arXiv:2309.03754v1 [cs.LG])
    Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$ for the convergence rate of DASGD, where $S_{avg}$ is the average staleness between models, $Q$ is a constant that bounds the norm of the gradients, and $\epsilon$ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(\sqrt{\hat{S}_{avg}\hat{S}_{max}}\epsilon^{-1})$, with $\hat{S}_{max}$ and $\hat{S}_{avg}$ representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers.
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v3 [cs.LG] UPDATED)
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $4N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.
    DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models. (arXiv:2309.03883v1 [cs.CL])
    Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLa consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving the performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts.
    Limitation of Characterizing Implicit Regularization by Data-independent Functions. (arXiv:2201.12198v2 [cs.LG] UPDATED)
    In recent years, understanding the implicit regularization of neural networks (NNs) has become a central task in deep learning theory. However, implicit regularization is itself not completely defined and well understood. In this work, we attempt to mathematically define and study implicit regularization. Importantly, we explore the limitations of a common approach to characterizing implicit regularization using data-independent functions. We propose two dynamical mechanisms, i.e., Two-point and One-point Overlapping mechanisms, based on which we provide two recipes for producing classes of one-hidden-neuron NNs that provably cannot be fully characterized by a type of or all data-independent functions. Following the previous works, our results further emphasize the profound data dependency of implicit regularization in general, inspiring us to study in detail the data dependency of NN implicit regularization in the future.
    Ensemble linear interpolators: The role of ensembling. (arXiv:2309.03354v1 [stat.ML])
    Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
    On Root Cause Localization and Anomaly Mitigation through Causal Inference. (arXiv:2212.04031v2 [cs.LG] UPDATED)
    Due to a wide spectrum of applications in the real world, such as security, financial surveillance, and health risk, various deep anomaly detection models have been proposed and achieved state-of-the-art performance. However, besides being effective, in practice, the practitioners would further like to know what causes the abnormal outcome and how to further fix it. In this work, we propose RootCLAM, which aims to achieve Root Cause Localization and Anomaly Mitigation from a causal perspective. Especially, we formulate anomalies caused by external interventions on the normal causal mechanism and aim to locate the abnormal features with external interventions as root causes. After that, we further propose an anomaly mitigation approach that aims to recommend mitigation actions on abnormal features to revert the abnormal outcomes such that the counterfactuals guided by the causal mechanism are normal. Experiments on three datasets show that our approach can locate the root causes and further flip the abnormal labels.
    Alzheimer Disease Detection from Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning. (arXiv:2309.03664v1 [cs.LG])
    The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls have been collected and analysed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (>87%). Although our results are preliminary, they indicate that RS and topological analysis together may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps will include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to understand if topological data analysis could support the characterization of AD subtypes.
    Prime and Modulate Learning: Generation of forward models with signed back-propagation and environmental cues. (arXiv:2309.03825v1 [cs.LG])
    Deep neural networks employing error back-propagation for learning can suffer from exploding and vanishing gradient problems. Numerous solutions have been proposed such as normalisation techniques or limiting activation functions to linear rectifying units. In this work we follow a different approach which is particularly applicable to closed-loop learning of forward models where back-propagation makes exclusive use of the sign of the error signal to prime the learning, whilst a global relevance signal modulates the rate of learning. This is inspired by the interaction between local plasticity and a global neuromodulation. For example, whilst driving on an empty road, one can allow for slow step-wise optimisation of actions, whereas, at a busy junction, an error must be corrected at once. Hence, the error is the priming signal and the intensity of the experience is a modulating factor in the weight change. The advantages of this Prime and Modulate paradigm is twofold: it is free from normalisation and it makes use of relevant cues from the environment to enrich the learning. We present a mathematical derivation of the learning rule in z-space and demonstrate the real-time performance with a robotic platform. The results show a significant improvement in the speed of convergence compared to that of the conventional back-propagation.
    Copula Representations and Error Surface Projections for the Exclusive Or Problem. (arXiv:1907.04483v2 [cs.LG] UPDATED)
    The exclusive or (xor) function is one of the simplest examples that illustrate why nonlinear feedforward networks are superior to linear regression for machine learning applications. We review the xor representation and approximation problems and discuss their solutions in terms of probabilistic logic and associative copula functions. After briefly reviewing the specification of feedforward networks, we compare the dynamics of learned error surfaces with different activation functions such as RELU and tanh through a set of colorful three-dimensional charts. The copula representations extend xor from Boolean to real values, thereby providing a convenient way to demonstrate the concept of cross-validation on in-sample and out-sample data sets. Our approach is pedagogical and is meant to be a machine learning prolegomenon.
    Towards provably efficient quantum algorithms for large-scale machine-learning models. (arXiv:2303.03428v4 [quant-ph] UPDATED)
    Large machine learning models are revolutionary technologies of artificial intelligence whose bottlenecks include huge computational expenses, power, and time used both in the pre-training and fine-tuning process. In this work, we show that fault-tolerant quantum computing could possibly provide provably efficient resolutions for generic (stochastic) gradient descent algorithms, scaling as $\mathcal{O}(T^2 \times \text{polylog}(n))$, where $n$ is the size of the models and $T$ is the number of iterations in the training, as long as the models are both sufficiently dissipative and sparse, with small learning rates. Based on earlier efficient quantum algorithms for dissipative differential equations, we find and prove that similar algorithms work for (stochastic) gradient descent, the primary algorithm for machine learning. In practice, we benchmark instances of large machine learning models from 7 million to 103 million parameters. We find that, in the context of sparse training, a quantum enhancement is possible at the early stage of learning after model pruning, motivating a sparse parameter download and re-upload scheme. Our work shows solidly that fault-tolerant quantum algorithms could potentially contribute to most state-of-the-art, large-scale machine-learning problems.
    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming. (arXiv:2209.08901v3 [math.OC] UPDATED)
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
    Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields. (arXiv:2306.12760v2 [cs.CV] UPDATED)
    Editing a local region or a specific object in a 3D scene represented by a NeRF or consistently blending a new realistic object into the scene is challenging, mainly due to the implicit nature of the scene representation. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.
    Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models. (arXiv:2307.14971v2 [cs.CV] UPDATED)
    With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.
    Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning. (arXiv:2309.03839v1 [cs.RO])
    Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practice. In this paper, we propose a reinforcement learning algorithm to address this by training an interface to map raw command signals to actions using a combination of offline pre-training and online fine-tuning. To address the challenges posed by noisy command signals and sparse rewards, we develop a novel method for representing and inferring the user's long-term intent for a given trajectory. We primarily evaluate our method's ability to assist users who can only communicate through noisy, high-dimensional input channels through a user study in which 12 participants performed a simulated navigation task by using their eye gaze to modulate a 128-dimensional command signal from their webcam. The results show that our method enables successful goal navigation more often than a baseline directional interface, by learning to denoise user commands signals and provide shared autonomy assistance. We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well. Extensive ablation experiments with simulated user commands empirically motivate each component of our method.
    AnthroNet: Conditional Generation of Humans via Anthropometrics. (arXiv:2309.03812v1 [cs.CV])
    We present a novel human body model formulated by an extensive set of anthropocentric measurements, which is capable of generating a wide range of human body shapes and poses. The proposed model enables direct modeling of specific human identities through a deep generative architecture, which can produce humans in any arbitrary pose. It is the first of its kind to have been trained end-to-end using only synthetically generated data, which not only provides highly accurate human mesh representations but also allows for precise anthropometry of the body. Moreover, using a highly diverse animation library, we articulated our synthetic humans' body and hands to maximize the diversity of the learnable priors for model training. Our model was trained on a dataset of $100k$ procedurally-generated posed human meshes and their corresponding anthropometric measurements. Our synthetic data generator can be used to generate millions of unique human identities and poses for non-commercial academic research purposes.
    ReFit: A Framework for Refinement of Weakly Supervised Semantic Segmentation using Object Border Fitting for Medical Images. (arXiv:2303.07853v2 [cs.CV] UPDATED)
    Weakly Supervised Semantic Segmentation (WSSS) relying only on image-level supervision is a promising approach to deal with the need for Segmentation networks, especially for generating a large number of pixel-wise masks in a given dataset. However, most state-of-the-art image-level WSSS techniques lack an understanding of the geometric features embedded in the images since the network cannot derive any object boundary information from just image-level labels. We define a boundary here as the line separating an object and its background, or two different objects. To address this drawback, we are proposing our novel ReFit framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques in order to achieve fine-grained higher-accuracy segmentation masks. To achieve this, we investigate a state-of-the-art unsupervised segmentation network that can be used to construct a boundary map, which enables ReFit to predict object locations with sharper boundaries. By applying our method to WSSS predictions, we achieved up to 10% improvement over the current state-of-the-art WSSS methods for medical imaging. The framework is open-source, to ensure that our results are reproducible, and accessible online at https://github.com/bharathprabakaran/ReFit.
    Adversarial Likelihood Estimation With One-Way Flows. (arXiv:2307.09882v2 [cs.LG] UPDATED)
    Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; and 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require a tractable inverse function. Our experimental results show that our method converges faster, produces comparable sample quality to GANs with similar architecture, successfully avoids over-fitting to commonly used datasets and produces smooth low-dimensional latent representations of the training data.
    Pure Exploration in Bandits with Linear Constraints. (arXiv:2306.12774v2 [cs.LG] UPDATED)
    We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when \emph{the arms are subject to linear constraints}. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem.
    Better Practices for Domain Adaptation. (arXiv:2309.03879v1 [cs.LG])
    Distribution shifts are all too common in real-world applications of machine learning. Domain adaptation (DA) aims to address this by providing various frameworks for adapting models to the deployment data without using labels. However, the domain shift scenario raises a second more subtle challenge: the difficulty of performing hyperparameter optimisation (HPO) for these adaptation algorithms without access to a labelled validation set. The unclear validation protocol for DA has led to bad practices in the literature, such as performing HPO using the target test labels when, in real-world scenarios, they are not available. This has resulted in over-optimism about DA research progress compared to reality. In this paper, we analyse the state of DA when using good evaluation practice, by benchmarking a suite of candidate validation criteria and using them to assess popular adaptation algorithms. We show that there are challenges across all three branches of domain adaptation methodology including Unsupervised Domain Adaptation (UDA), Source-Free Domain Adaptation (SFDA), and Test Time Adaptation (TTA). While the results show that realistically achievable performance is often worse than expected, they also show that using proper validation splits is beneficial, as well as showing that some previously unexplored validation metrics provide the best options to date. Altogether, our improved practices covering data, training, validation and hyperparameter optimisation form a new rigorous pipeline to improve benchmarking, and hence research progress, within this important field going forward.
    Dataset Generation and Bonobo Classification from Weakly Labelled Videos. (arXiv:2309.03671v1 [cs.CV])
    This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to a macaque detector in order to spatially detect the individual present in the video. Handcrafted features coupled with different classification algorithms and deep-learning methods using a ResNet architecture are investigated for bonobo identification. Performance is compared in terms of classification accuracy on the splits of the database using different data separation methods. We demonstrate the importance of data preparation and how a wrong data separation can lead to false good results. Finally, after a meaningful separation of the data, the best classification performance is obtained using a fine-tuned ResNet model and reaches 75% of accuracy.
    A computationally lightweight safe learning algorithm. (arXiv:2309.03672v1 [eess.SY])
    Safety is an essential asset when learning control policies for physical systems, as violating safety constraints during training can lead to expensive hardware damage. In response to this need, the field of safe learning has emerged with algorithms that can provide probabilistic safety guarantees without knowledge of the underlying system dynamics. Those algorithms often rely on Gaussian process inference. Unfortunately, Gaussian process inference scales cubically with the number of data points, limiting applicability to high-dimensional and embedded systems. In this paper, we propose a safe learning algorithm that provides probabilistic safety guarantees but leverages the Nadaraya-Watson estimator instead of Gaussian processes. For the Nadaraya-Watson estimator, we can reach logarithmic scaling with the number of data points. We provide theoretical guarantees for the estimates, embed them into a safe learning algorithm, and show numerical experiments on a simulated seven-degrees-of-freedom robot manipulator.
    Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck. (arXiv:2309.03800v1 [cs.LG])
    This work investigates the nuanced algorithm design choices for deep learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.
    CPU frequency scheduling of real-time applications on embedded devices with temporal encoding-based deep reinforcement learning. (arXiv:2309.03779v1 [cs.LG])
    Small devices are frequently used in IoT and smart-city applications to perform periodic dedicated tasks with soft deadlines. This work focuses on developing methods to derive efficient power-management methods for periodic tasks on small devices. We first study the limitations of the existing Linux built-in methods used in small devices. We illustrate three typical workload/system patterns that are challenging to manage with Linux's built-in solutions. We develop a reinforcement-learning-based technique with temporal encoding to derive an effective DVFS governor even with the presence of the three system patterns. The derived governor uses only one performance counter, the same as the built-in Linux mechanism, and does not require an explicit task model for the workload. We implemented a prototype system on the Nvidia Jetson Nano Board and experimented with it with six applications, including two self-designed and four benchmark applications. Under different deadline constraints, our approach can quickly derive a DVFS governor that can adapt to performance requirements and outperform the built-in Linux approach in energy saving. On Mibench workloads, with performance slack ranging from 0.04 s to 0.4 s, the proposed method can save 3% - 11% more energy compared to Ondemand. AudioReg and FaceReg applications tested have 5%- 14% energy-saving improvement. We have open-sourced the implementation of our in-kernel quantized neural network engine. The codebase can be found at: https://github.com/coladog/tinyagent.
    Your Battery Is a Blast! Safeguarding Against Counterfeit Batteries with Authentication. (arXiv:2309.03607v1 [cs.CR])
    Lithium-ion (Li-ion) batteries are the primary power source in various applications due to their high energy and power density. Their market was estimated to be up to 48 billion U.S. dollars in 2022. However, the widespread adoption of Li-ion batteries has resulted in counterfeit cell production, which can pose safety hazards to users. Counterfeit cells can cause explosions or fires, and their prevalence in the market makes it difficult for users to detect fake cells. Indeed, current battery authentication methods can be susceptible to advanced counterfeiting techniques and are often not adaptable to various cells and systems. In this paper, we improve the state of the art on battery authentication by proposing two novel methodologies, DCAuth and EISthentication, which leverage the internal characteristics of each cell through Machine Learning models. Our methods automatically authenticate lithium-ion battery models and architectures using data from their regular usage without the need for any external device. They are also resilient to the most common and critical counterfeit practices and can scale to several batteries and devices. To evaluate the effectiveness of our proposed methodologies, we analyze time-series data from a total of 20 datasets that we have processed to extract meaningful features for our analysis. Our methods achieve high accuracy in battery authentication for both architectures (up to 0.99) and models (up to 0.96). Moreover, our methods offer comparable identification performances. By using our proposed methodologies, manufacturers can ensure that devices only use legitimate batteries, guaranteeing the operational state of any system and safety measures for the users.
    Temporal Inductive Path Neural Network for Temporal Knowledge Graph Reasoning. (arXiv:2309.03251v1 [cs.AI])
    Temporal Knowledge Graph (TKG) is an extension of traditional Knowledge Graph (KG) that incorporates the dimension of time. Reasoning on TKGs is a crucial task that aims to predict future facts based on historical occurrences. The key challenge lies in uncovering structural dependencies within historical subgraphs and temporal patterns. Most existing approaches model TKGs relying on entity modeling, as nodes in the graph play a crucial role in knowledge representation. However, the real-world scenario often involves an extensive number of entities, with new entities emerging over time. This makes it challenging for entity-dependent methods to cope with extensive volumes of entities, and effectively handling newly emerging entities also becomes a significant challenge. Therefore, we propose Temporal Inductive Path Neural Network (TiPNN), which models historical information in an entity-independent perspective. Specifically, TiPNN adopts a unified graph, namely history temporal graph, to comprehensively capture and encapsulate information from history. Subsequently, we utilize the defined query-aware temporal paths to model historical path information related to queries on history temporal graph for the reasoning. Extensive experiments illustrate that the proposed model not only attains significant performance enhancements but also handles inductive settings, while additionally facilitating the provision of reasoning evidence through history temporal graphs.
    Feature Enhancer Segmentation Network (FES-Net) for Vessel Segmentation. (arXiv:2309.03535v1 [eess.IV])
    Diseases such as diabetic retinopathy and age-related macular degeneration pose a significant risk to vision, highlighting the importance of precise segmentation of retinal vessels for the tracking and diagnosis of progression. However, existing vessel segmentation methods that heavily rely on encoder-decoder structures struggle to capture contextual information about retinal vessel configurations, leading to challenges in reconciling semantic disparities between encoder and decoder features. To address this, we propose a novel feature enhancement segmentation network (FES-Net) that achieves accurate pixel-wise segmentation without requiring additional image enhancement steps. FES-Net directly processes the input image and utilizes four prompt convolutional blocks (PCBs) during downsampling, complemented by a shallow upsampling approach to generate a binary mask for each class. We evaluate the performance of FES-Net on four publicly available state-of-the-art datasets: DRIVE, STARE, CHASE, and HRF. The evaluation results clearly demonstrate the superior performance of FES-Net compared to other competitive approaches documented in the existing literature.
    A Majority Invariant Approach to Patch Robustness Certification for Deep Learning Models. (arXiv:2308.00452v2 [cs.LG] UPDATED)
    Patch robustness certification ensures no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.
    Fast FixMatch: Faster Semi-Supervised Learning with Curriculum Batch Size. (arXiv:2309.03469v1 [cs.LG])
    Advances in Semi-Supervised Learning (SSL) have almost entirely closed the gap between SSL and Supervised Learning at a fraction of the number of labels. However, recent performance improvements have often come \textit{at the cost of significantly increased training computation}. To address this, we propose Curriculum Batch Size (CBS), \textit{an unlabeled batch size curriculum which exploits the natural training dynamics of deep neural networks.} A small unlabeled batch size is used in the beginning of training and is gradually increased to the end of training. A fixed curriculum is used regardless of dataset, model or number of epochs, and reduced training computations is demonstrated on all settings. We apply CBS, strong labeled augmentation, Curriculum Pseudo Labeling (CPL) \citep{FlexMatch} to FixMatch \citep{FixMatch} and term the new SSL algorithm Fast FixMatch. We perform an ablation study to show that strong labeled augmentation and/or CPL do not significantly reduce training computations, but, in synergy with CBS, they achieve optimal performance. Fast FixMatch also achieves substantially higher data utilization compared to previous state-of-the-art. Fast FixMatch achieves between $2.1\times$ - $3.4\times$ reduced training computations on CIFAR-10 with all but 40, 250 and 4000 labels removed, compared to vanilla FixMatch, while attaining the same cited state-of-the-art error rate \citep{FixMatch}. Similar results are achieved for CIFAR-100, SVHN and STL-10. Finally, Fast MixMatch achieves between $2.6\times$ - $3.3\times$ reduced training computations in federated SSL tasks and online/streaming learning SSL tasks, which further demonstrate the generializbility of Fast MixMatch to different scenarios and tasks.
    DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend. (arXiv:2309.03579v1 [cs.LG])
    Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification and clustering. Existing measures may fail to capture similarities due to local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in ensemble building and clustering of epidemic curves. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.
    GraPhSyM: Graph Physical Synthesis Model. (arXiv:2308.03944v2 [cs.LG] UPDATED)
    In this work, we introduce GraPhSyM, a Graph Attention Network (GATv2) model for fast and accurate estimation of post-physical synthesis circuit delay and area metrics from pre-physical synthesis circuit netlists. Once trained, GraPhSyM provides accurate visibility of final design metrics to early EDA stages, such as logic synthesis, without running the slow physical synthesis flow, enabling global co-optimization across stages. Additionally, the swift and precise feedback provided by GraPhSyM is instrumental for machine-learning-based EDA optimization frameworks. Given a gate-level netlist of a circuit represented as a graph, GraPhSyM utilizes graph structure, connectivity, and electrical property features to predict the impact of physical synthesis transformations such as buffer insertion and gate sizing. When trained on a dataset of 6000 prefix adder designs synthesized at an aggressive delay target, GraPhSyM can accurately predict the post-synthesis delay (98.3%) and area (96.1%) metrics of unseen adders with a fast 0.22s inference time. Furthermore, we illustrate the compositionality of GraPhSyM by employing the model trained on a fixed delay target to accurately anticipate post-synthesis metrics at a variety of unseen delay targets. Lastly, we report promising generalization capabilities of the GraPhSyM model when it is evaluated on circuits different from the adders it was exclusively trained on. The results show the potential for GraPhSyM to serve as a powerful tool for advanced optimization techniques and as an oracle for EDA machine learning frameworks.
    A State Representation for Diminishing Rewards. (arXiv:2309.03710v1 [cs.LG])
    A common setting in multitask reinforcement learning (RL) demands that an agent rapidly adapt to various stationary reward functions randomly sampled from a fixed distribution. In such situations, the successor representation (SR) is a popular framework which supports rapid policy evaluation by decoupling a policy's expected discounted, cumulative state occupancies from a specific reward function. However, in the natural world, sequential tasks are rarely independent, and instead reflect shifting priorities based on the availability and subjective perception of rewarding stimuli. Reflecting this disjunction, in this paper we study the phenomenon of diminishing marginal utility and introduce a novel state representation, the $\lambda$ representation ($\lambda$R) which, surprisingly, is required for policy evaluation in this setting and which generalizes the SR as well as several other state representations from the literature. We establish the $\lambda$R's formal properties and examine its normative advantages in the context of machine learning, as well as its usefulness for studying natural behaviors, particularly foraging.
    DeepAD: A Robust Deep Learning Model of Alzheimer's Disease Progression for Real-World Clinical Applications. (arXiv:2203.09096v5 [cs.LG] UPDATED)
    The ability to predict the future trajectory of a patient is a key step toward the development of therapeutics for complex diseases such as Alzheimer's disease (AD). However, most machine learning approaches developed for prediction of disease progression are either single-task or single-modality models, which can not be directly adopted to our setting involving multi-task learning with high dimensional images. Moreover, most of those approaches are trained on a single dataset (i.e. cohort), which can not be generalized to other cohorts. We propose a novel multimodal multi-task deep learning model to predict AD progression by analyzing longitudinal clinical and neuroimaging data from multiple cohorts. Our proposed model integrates high dimensional MRI features from a 3D convolutional neural network with other data modalities, including clinical and demographic information, to predict the future trajectory of patients. Our model employs an adversarial loss to alleviate the study-specific imaging bias, in particular the inter-study domain shifts. In addition, a Sharpness-Aware Minimization (SAM) optimization technique is applied to further improve model generalization. The proposed model is trained and tested on various datasets in order to evaluate and validate the results. Our results showed that 1) our model yields significant improvement over the baseline models, and 2) models using extracted neuroimaging features from 3D convolutional neural network outperform the same models when applied to MRI-derived volumetric features.
    Scalable Learning of Intrusion Responses through Recursive Decomposition. (arXiv:2309.03292v1 [eess.SY])
    We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that DFSP significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.
    Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation. (arXiv:2309.03238v1 [cs.LG])
    Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in developing accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion production. To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset, which introduces controlled stressors during data collection to better represent real-world influences on emotion production. To address issues with label subjectivity, this research comprehensively analyzes how data augmentation techniques and annotation schemes impact emotion perception and annotator labels. It further handles natural confounding variables and variations by employing adversarial networks to isolate key factors like stress from learned emotion representations during model training. For tackling concerns about leakage of sensitive demographic variables, this work leverages adversarial learning to strip sensitive demographic information from multimodal encodings. Additionally, it proposes optimized sociological evaluation metrics aligned with cost-effective, real-world needs for model testing. This research advances robust, practical emotion recognition through multifaceted studies of challenges in datasets, labels, modeling, demographic and membership variable encoding in representations, and evaluation. The groundwork has been laid for cost-effective, generalizable emotion recognition models that are less likely to encode sensitive demographic information.
    A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation. (arXiv:2309.02539v2 [eess.AS] UPDATED)
    Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue stem, the music stem, and the effects stem from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psycho-acoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with easily detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.
    CenTime: Event-Conditional Modelling of Censoring in Survival Analysis. (arXiv:2309.03851v1 [cs.LG])
    Survival analysis is a valuable tool for estimating the time until specific events, such as death or cancer recurrence, based on baseline observations. This is particularly useful in healthcare to prognostically predict clinically important events based on patient data. However, existing approaches often have limitations; some focus only on ranking patients by survivability, neglecting to estimate the actual event time, while others treat the problem as a classification task, ignoring the inherent time-ordered structure of the events. Furthermore, the effective utilization of censored samples - training data points where the exact event time is unknown - is essential for improving the predictive accuracy of the model. In this paper, we introduce CenTime, a novel approach to survival analysis that directly estimates the time to event. Our method features an innovative event-conditional censoring mechanism that performs robustly even when uncensored data is scarce. We demonstrate that our approach forms a consistent estimator for the event model parameters, even in the absence of uncensored data. Furthermore, CenTime is easily integrated with deep learning models with no restrictions on batch size or the number of uncensored samples. We compare our approach with standard survival analysis methods, including the Cox proportional-hazard model and DeepHit. Our results indicate that CenTime offers state-of-the-art performance in predicting time-to-death while maintaining comparable ranking performance. Our implementation is publicly available at https://github.com/ahmedhshahin/CenTime.
    VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference. (arXiv:2308.00904v2 [cs.LG] UPDATED)
    Causal inference plays a vital role in diverse domains like epidemiology, healthcare, and economics. De-confounding and counterfactual prediction in observational data has emerged as a prominent concern in causal inference research. While existing models tackle observed confounders, the presence of unobserved confounders remains a significant challenge, distorting causal inference and impacting counterfactual outcome accuracy. To address this, we propose a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI), which generates the posterior distribution of unobserved confounders. VLUCI relaxes the unconfoundedness assumption often overlooked by most causal inference methods. By disentangling observed and unobserved confounders, VLUCI constructs a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes. Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI's superior performance in inferring unobserved confounders. It is compatible with state-of-the-art counterfactual inference models, significantly improving inference accuracy at both group and individual levels. Additionally, VLUCI provides confidence intervals for counterfactual outcomes, aiding decision-making in risk-sensitive domains. We further clarify the considerations when applying VLUCI to cases where unobserved confounders don't strictly conform to our model assumptions using the public IHDP dataset as an example, highlighting the practical advantages of VLUCI.
    DiFaReli: Diffusion Face Relighting. (arXiv:2304.09479v3 [cs.CV] UPDATED)
    We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io
    Examining the Effectiveness of Chatbots in Gathering Family History Information in Comparison to the Standard In-Person Interview-Based Approach. (arXiv:2309.03223v1 [cs.HC])
    One of the most common things that a genealogist is tasked with is the gathering of a person's initial family history, normally via in-person interviews or with the use of a platform such as ancestry.com, as this can provide a strong foundation upon which a genealogist may build. However, the ability to conduct these interviews can often be hindered by both geographical constraints and the technical proficiency of the interviewee, as the interviewee in these types of interviews is most often an elderly person with a lower than average level of technical proficiency. With this in mind, this study presents what we believe, based on prior research, to be the first chatbot geared entirely towards the gathering of family histories, and explores the viability of utilising such a chatbot by comparing the performance and usability of such a method with the aforementioned alternatives. With a chatbot-based approach, we show that, though the average time taken to conduct an interview may be longer than if the user had used ancestry.com or participated in an in-person interview, the number of mistakes made and the level of confusion from the user regarding the UI and process required is lower than the other two methods. Note that the final metric regarding the user's confusion is not applicable for the in-person interview sessions due to its lack of a UI. With refinement, we believe this use of a chatbot could be a valuable tool for genealogists, especially when dealing with interviewees who are based in other countries where it is not possible to conduct an in-person interview.
    Neural lasso: a unifying approach of lasso and neural networks. (arXiv:2309.03770v1 [stat.ML])
    In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.
    Models of human preference for learning reward functions. (arXiv:2206.02231v3 [cs.LG] UPDATED)
    The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments, a type of reinforcement learning from human feedback (RLHF). These human preferences are typically assumed to be informed solely by partial return, the sum of rewards along each segment. We find this assumption to be flawed and propose modeling human preferences instead as informed by each segment's regret, a measure of a segment's deviation from optimal decision-making. Given infinitely many preferences generated according to regret, we prove that we can identify a reward function equivalent to the reward function that generated those preferences, and we prove that the previous partial return model lacks this identifiability property in multiple contexts. We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting. Additionally, we find that our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned. Overall, this work establishes that the choice of preference model is impactful, and our proposed regret preference model provides an improvement upon a core assumption of recent research. We have open sourced our experimental code, the human preferences dataset we gathered, and our training and preference elicitation interfaces for gathering a such a dataset.
    Internet Explorer: Targeted Representation Learning on the Open Web. (arXiv:2302.14051v2 [cs.LG] UPDATED)
    Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30--40 hours. Results, visualizations, and videos at https://internet-explorer-ssl.github.io/
    Off-policy Evaluation in Doubly Inhomogeneous Environments. (arXiv:2306.08719v2 [stat.ME] UPDATED)
    This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both model-based and model-free approaches. To our knowledge, this is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities. It contributes to a deeper understanding of OPE in environments, where standard RL assumptions are not met, and provides several practical approaches in these settings. We establish the theoretical properties of the proposed value estimators and empirically show that our approach outperforms competing methods that ignore either temporal nonstationarity or individual heterogeneity. Finally, we illustrate our method on a data set from the Medical Information Mart for Intensive Care.
    Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio. (arXiv:2309.03202v1 [q-fin.TR])
    This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed.
    How adversarial attacks can disrupt seemingly stable accurate classifiers. (arXiv:2309.03665v1 [cs.LG])
    Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.
    LB-SimTSC: An Efficient Similarity-Aware Graph Neural Network for Semi-Supervised Time Series Classification. (arXiv:2301.04838v3 [cs.LG] UPDATED)
    Time series classification is an important data mining task that has received a lot of interest in the past two decades. Due to the label scarcity in practice, semi-supervised time series classification with only a few labeled samples has become popular. Recently, Similarity-aware Time Series Classification (SimTSC) is proposed to address this problem by using a graph neural network classification model on the graph generated from pairwise Dynamic Time Warping (DTW) distance of batch data. It shows excellent accuracy and outperforms state-of-the-art deep learning models in several few-label settings. However, since SimTSC relies on pairwise DTW distances, the quadratic complexity of DTW limits its usability to only reasonably sized datasets. To address this challenge, we propose a new efficient semi-supervised time series classification technique, LB-SimTSC, with a new graph construction module. Instead of using DTW, we propose to utilize a lower bound of DTW, LB_Keogh, to approximate the dissimilarity between instances in linear time, while retaining the relative proximity relationships one would have obtained via computing DTW. We construct the pairwise distance matrix using LB_Keogh and build a graph for the graph neural network. We apply this approach to the ten largest datasets from the well-known UCR time series classification archive. The results demonstrate that this approach can be up to 104x faster than SimTSC when constructing the graph on large datasets without significantly decreasing classification accuracy.  ( 3 min )
    Learning a Patent-Informed Biomedical Knowledge Graph Reveals Technological Potential of Drug Repositioning Candidates. (arXiv:2309.03227v1 [cs.AI])
    Drug repositioning-a promising strategy for discovering new therapeutic uses for existing drugs-has been increasingly explored in the computational science literature using biomedical databases. However, the technological potential of drug repositioning candidates has often been overlooked. This study presents a novel protocol to comprehensively analyse various sources such as pharmaceutical patents and biomedical databases, and identify drug repositioning candidates with both technological potential and scientific evidence. To this end, first, we constructed a scientific biomedical knowledge graph (s-BKG) comprising relationships between drugs, diseases, and genes derived from biomedical databases. Our protocol involves identifying drugs that exhibit limited association with the target disease but are closely located in the s-BKG, as potential drug candidates. We constructed a patent-informed biomedical knowledge graph (p-BKG) by adding pharmaceutical patent information. Finally, we developed a graph embedding protocol to ascertain the structure of the p-BKG, thereby calculating the relevance scores of those candidates with target disease-related patents to evaluate their technological potential. Our case study on Alzheimer's disease demonstrates its efficacy and feasibility, while the quantitative outcomes and systematic methods are expected to bridge the gap between computational discoveries and successful market applications in drug repositioning research.  ( 2 min )
    Insights Into the Inner Workings of Transformer Models for Protein Function Prediction. (arXiv:2309.03631v1 [cs.LG])
    Motivation: We explored how explainable AI (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g., transmembrane regions, active sites) across many proteins. Availability and Implementation: Source code can be accessed at https://github.com/markuswenzel/xai-proteins .  ( 2 min )
    Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples. (arXiv:2309.03847v1 [stat.ML])
    We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].  ( 2 min )
    MVD:A Novel Methodology and Dataset for Acoustic Vehicle Type Classification. (arXiv:2309.03544v1 [cs.SD])
    Rising urban populations have led to a surge in vehicle use and made traffic monitoring and management indispensable. Acoustic traffic monitoring (ATM) offers a cost-effective and efficient alternative to more computationally expensive methods of monitoring traffic such as those involving computer vision technologies. In this paper, we present MVD and MVDA: two open datasets for the development of acoustic traffic monitoring and vehicle-type classification algorithms, which contain audio recordings of moving vehicles. The dataset contain four classes- Trucks, Cars, Motorbikes, and a No-vehicle class. Additionally, we propose a novel and efficient way to accurately classify these acoustic signals using cepstrum and spectrum based local and global audio features, and a multi-input neural network. Experimental results show that our methodology improves upon the established baselines of previous works and achieves an accuracy of 91.98% and 96.66% on MVD and MVDA Datasets, respectively. Finally, the proposed model was deployed through an Android application to make it accessible for testing and demonstrate its efficacy.  ( 2 min )
    DiffDefense: Defending against Adversarial Attacks via Diffusion Models. (arXiv:2309.03702v1 [cs.LG])
    This paper presents a novel reconstruction method that leverages Diffusion Models to protect machine learning classifiers against adversarial attacks, all without requiring any modifications to the classifiers themselves. The susceptibility of machine learning models to minor input perturbations renders them vulnerable to adversarial attacks. While diffusion-based methods are typically disregarded for adversarial defense due to their slow reverse process, this paper demonstrates that our proposed method offers robustness against adversarial threats while preserving clean accuracy, speed, and plug-and-play compatibility. Code at: https://github.com/HondamunigePrasannaSilva/DiffDefence.  ( 2 min )
    Kernelized Concept Erasure. (arXiv:2201.12191v4 [cs.LG] UPDATED)
    The representation space of neural models for textual data emerges in an unsupervised manner during training. Understanding how those representations encode human-interpretable concepts is a fundamental problem. One prominent approach for the identification of concepts in neural representations is searching for a linear subspace whose erasure prevents the prediction of the concept from the representations. However, while many linear erasure algorithms are tractable and interpretable, neural networks do not necessarily represent concepts in a linear manner. To identify non-linearly encoded concepts, we propose a kernelization of a linear minimax game for concept erasure. We demonstrate that it is possible to prevent specific non-linear adversaries from predicting the concept. However, the protection does not transfer to different nonlinear adversaries. Therefore, exhaustively erasing a non-linearly encoded concept remains an open problem.  ( 2 min )
    Retail store customer behavior analysis system: Design and Implementation. (arXiv:2309.03232v1 [cs.LG])
    Understanding customer behavior in retail stores plays a crucial role in improving customer satisfaction by adding personalized value to services. Behavior analysis reveals both general and detailed patterns in the interaction of customers with a store items and other people, providing store managers with insight into customer preferences. Several solutions aim to utilize this data by recognizing specific behaviors through statistical visualization. However, current approaches are limited to the analysis of small customer behavior sets, utilizing conventional methods to detect behaviors. They do not use deep learning techniques such as deep neural networks, which are powerful methods in the field of computer vision. Furthermore, these methods provide limited figures when visualizing the behavioral data acquired by the system. In this study, we propose a framework that includes three primary parts: mathematical modeling of customer behaviors, behavior analysis using an efficient deep learning based system, and individual and group behavior visualization. Each module and the entire system were validated using data from actual situations in a retail store.  ( 2 min )
    PGFed: Personalize Each Client's Global Objective for Federated Learning. (arXiv:2212.01448v2 [cs.LG] UPDATED)
    Personalized federated learning has received an upsurge of attention due to the mediocre performance of conventional federated learning (FL) over heterogeneous data. Unlike conventional FL which trains a single global consensus model, personalized FL allows different models for different clients. However, existing personalized FL algorithms only implicitly transfer the collaborative knowledge across the federation by embedding the knowledge into the aggregated model or regularization. We observed that this implicit knowledge transfer fails to maximize the potential of each client's empirical risk toward other clients. Based on our observation, in this work, we propose Personalized Global Federated Learning (PGFed), a novel personalized FL framework that enables each client to personalize its own global objective by explicitly and adaptively aggregating the empirical risks of itself and other clients. To avoid massive (O(N^2)) communication overhead and potential privacy leakage while achieving this, each client's risk is estimated through a first-order approximation for other clients' adaptive risk aggregation. On top of PGFed, we develop a momentum upgrade, dubbed PGFedMo, to more efficiently utilize clients' empirical risks. Our extensive experiments on four datasets under different federated settings show consistent improvements of PGFed over previous state-of-the-art methods. The code is publicly available at https://github.com/ljaiverson/pgfed.  ( 2 min )
    Auto-SDE: Learning effective reduced dynamics from data-driven stochastic dynamical systems. (arXiv:2205.04151v2 [stat.ML] UPDATED)
    Multiscale stochastic dynamical systems have been widely adopted to scientific and engineering problems due to their capability of depicting complex phenomena in many real world applications. This work is devoted to investigating the effective reduced dynamics for a slow-fast stochastic dynamical system. Given observation data on a short-term period satisfying some unknown slow-fast stochastic system, we propose a novel algorithm including a neural network called Auto-SDE to learn invariant slow manifold. Our approach captures the evolutionary nature of a series of time-dependent autoencoder neural networks with the loss constructed from a discretized stochastic differential equation. Our algorithm is also proved to be accurate, stable and effective through numerical experiments under various evaluation metrics.  ( 2 min )
    A Function Interpretation Benchmark for Evaluating Interpretability Methods. (arXiv:2309.03886v1 [cs.CL])
    Labeling neural network submodules with human-legible descriptions is useful for many downstream tasks: such descriptions can surface failures, guide interventions, and perhaps even explain important model behaviors. To date, most mechanistic descriptions of trained networks have involved small models, narrowly delimited phenomena, and large amounts of human labor. Labeling all human-interpretable sub-computations in models of increasing size and complexity will almost certainly require tools that can generate and validate descriptions automatically. Recently, techniques that use learned models in-the-loop for labeling have begun to gain traction, but methods for evaluating their efficacy are limited and ad-hoc. How should we validate and compare open-ended labeling tools? This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating the building blocks of automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. The functions are procedurally constructed across textual and numeric domains, and involve a range of real-world complexities, including noise, composition, approximation, and bias. We evaluate new and existing methods that use language models (LMs) to produce code-based and language descriptions of function behavior. We find that an off-the-shelf LM augmented with only black-box access to functions can sometimes infer their structure, acting as a scientist by forming hypotheses, proposing experiments, and updating descriptions in light of new data. However, LM-based descriptions tend to capture global function behavior and miss local corruptions. These results show that FIND will be useful for characterizing the performance of more sophisticated interpretability methods before they are applied to real-world models.  ( 3 min )
    Early warning via transitions in latent stochastic dynamical systems. (arXiv:2309.03842v1 [stat.ML])
    Early warnings for dynamical transitions in complex systems or high-dimensional observation data are essential in many real world applications, such as gene mutation, brain diseases, natural disasters, financial crises, and engineering reliability. To effectively extract early warning signals, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and derive early warning signals capable of detecting the tipping point during the state transition. Our method bridges the latent dynamics with the original dataset. The framework is validated to be accurate and effective through numerical experiments, in terms of density and transition probability. It is shown that the second coordinate holds meaningful information for critical transition in various evaluation metrics.  ( 2 min )
    XGen-7B Technical Report. (arXiv:2309.03450v1 [cs.CL])
    Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.  ( 2 min )
    ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation. (arXiv:2309.03827v1 [cs.CV])
    High Dynamic Range (HDR) content creation has become an important topic for modern media and entertainment sectors, gaming and Augmented/Virtual Reality industries. Many methods have been proposed to recreate the HDR counterparts of input Low Dynamic Range (LDR) images/videos given a single exposure or multi-exposure LDRs. The state-of-the-art methods focus primarily on the preservation of the reconstruction's structural similarity and the pixel-wise accuracy. However, these conventional approaches do not emphasize preserving the artistic intent of the images in terms of human visual perception, which is an essential element in media, entertainment and gaming. In this paper, we attempt to study and fill this gap. We propose an architecture called ArtHDR-Net based on a Convolutional Neural Network that uses multi-exposed LDR features as input. Experimental results show that ArtHDR-Net can achieve state-of-the-art performance in terms of the HDR-VDP-2 score (i.e., mean opinion score index) while reaching competitive performance in terms of PSNR and SSIM.  ( 2 min )
    Subgraph-based Tight Frames on Graphs with Compact Supports and Vanishing Moments. (arXiv:2309.03537v1 [eess.SP])
    In this work, we proposed a novel and general method to construct tight frames on graphs with compact supports based on a series of hierarchical partitions. Starting from our abstract construction that generalizes previous methods based on partition trees, we are able to flexibly incorporate subgraph Laplacians into our design of graph frames. Consequently, our general methods permit adjusting the (subgraph) vanishing moments of the framelets and extra properties, such as directionality, for efficiently representing graph signals with path-like supports. Several variants are explicitly defined and tested. Experimental results show our proposed graph frames perform superiorly in non-linear approximation tasks.  ( 2 min )
    Graph Theory Applications in Advanced Geospatial Research. (arXiv:2309.03249v1 [cs.LG])
    Geospatial sciences include a wide range of applications, from environmental monitoring transportation to infrastructure planning, as well as location-based analysis and services. Graph theory algorithms in mathematics have emerged as indispensable tools in these domains due to their capability to model and analyse spatial relationships efficiently. This technical report explores the applications of graph theory algorithms in geospatial sciences, highlighting their role in network analysis, spatial connectivity, geographic information systems, and various other spatial problem-solving scenarios. It provides a comprehensive idea about the key concepts and algorithms of graph theory that assist the modelling processes. The report provides insights into the practical significance of graph theory in addressing real-world geospatial challenges and opportunities. It lists the extensive research, innovative technologies and methodologies implemented in this field.  ( 2 min )
    Cross-Task Attention Network: Improving Multi-Task Learning for Medical Imaging Applications. (arXiv:2309.03837v1 [cs.CV])
    Multi-task learning (MTL) is a powerful approach in deep learning that leverages the information from multiple tasks during training to improve model performance. In medical imaging, MTL has shown great potential to solve various tasks. However, existing MTL architectures in medical imaging are limited in sharing information across tasks, reducing the potential performance improvements of MTL. In this study, we introduce a novel attention-based MTL framework to better leverage inter-task interactions for various tasks from pixel-level to image-level predictions. Specifically, we propose a Cross-Task Attention Network (CTAN) which utilizes cross-task attention mechanisms to incorporate information by interacting across tasks. We validated CTAN on four medical imaging datasets that span different domains and tasks including: radiation treatment planning prediction using planning CT images of two different target cancers (Prostate, OpenKBP); pigmented skin lesion segmentation and diagnosis using dermatoscopic images (HAM10000); and COVID-19 diagnosis and severity prediction using chest CT scans (STOIC). Our study demonstrates the effectiveness of CTAN in improving the accuracy of medical imaging tasks. Compared to standard single-task learning (STL), CTAN demonstrated a 4.67% improvement in performance and outperformed both widely used MTL baselines: hard parameter sharing (HPS) with an average performance improvement of 3.22%; and multi-task attention network (MTAN) with a relative decrease of 5.38%. These findings highlight the significance of our proposed MTL framework in solving medical imaging tasks and its potential to improve their accuracy across domains.  ( 3 min )
    Companion Animal Disease Diagnostics based on Literal-aware Medical Knowledge Graph Representation Learning. (arXiv:2309.03219v1 [cs.AI])
    Knowledge graph (KG) embedding has been used to benefit the diagnosis of animal diseases by analyzing electronic medical records (EMRs), such as notes and veterinary records. However, learning representations to capture entities and relations with literal information in KGs is challenging as the KGs show heterogeneous properties and various types of literal information. Meanwhile, the existing methods mostly aim to preserve graph structures surrounding target nodes without considering different types of literals, which could also carry significant information. In this paper, we propose a knowledge graph embedding model for the efficient diagnosis of animal diseases, which could learn various types of literal information and graph structure and fuse them into unified representations, namely LiteralKG. Specifically, we construct a knowledge graph that is built from EMRs along with literal information collected from various animal hospitals. We then fuse different types of entities and node feature information into unified vector representations through gate networks. Finally, we propose a self-supervised learning task to learn graph structure in pretext tasks and then towards various downstream tasks. Experimental results on link prediction tasks demonstrate that our model outperforms the baselines that consist of state-of-the-art models. The source code is available at https://github.com/NSLab-CUK/LiteralKG.  ( 2 min )
    EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System. (arXiv:2309.03246v1 [cs.LG])
    The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, and hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.  ( 3 min )
  • Open

    Deep Metric Learning with Chance Constraints. (arXiv:2209.09060v3 [cs.CV] CROSS LISTED)
    Deep metric learning (DML) aims to minimize empirical expected loss of the pairwise intra-/inter- class proximity violations in the embedding space. We relate DML to feasibility problem of finite chance constraints. We show that minimizer of proxy-based DML satisfies certain chance constraints, and that the worst case generalization performance of the proxy-based methods can be characterized by the radius of the smallest ball around a class proxy to cover the entire domain of the corresponding class samples, suggesting multiple proxies per class helps performance. To provide a scalable algorithm as well as exploiting more proxies, we consider the chance constraints implied by the minimizers of proxy-based DML instances and reformulate DML as finding a feasible point in intersection of such constraints, resulting in a problem to be approximately solved by iterative projections. Simply put, we repeatedly train a regularized proxy-based loss and re-initialize the proxies with the embeddings of the deliberately selected new samples. We applied our method with 4 well-accepted DML losses and show the effectiveness with extensive evaluations on 4 popular DML benchmarks. Code is available at: https://github.com/yetigurbuz/ccp-dml
    Neural lasso: a unifying approach of lasso and neural networks. (arXiv:2309.03770v1 [stat.ML])
    In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.
    Enhancing Pipeline-Based Conversational Agents with Large Language Models. (arXiv:2309.03748v1 [cs.CL])
    The latest advancements in AI and deep learning have led to a breakthrough in large language model (LLM)-based agents such as GPT-4. However, many commercial conversational agent development tools are pipeline-based and have limitations in holding a human-like conversation. This paper investigates the capabilities of LLMs to enhance pipeline-based conversational agents during two phases: 1) in the design and development phase and 2) during operations. In 1) LLMs can aid in generating training data, extracting entities and synonyms, localization, and persona design. In 2) LLMs can assist in contextualization, intent classification to prevent conversational breakdown and handle out-of-scope questions, auto-correcting utterances, rephrasing responses, formulating disambiguation questions, summarization, and enabling closed question-answering capabilities. We conducted informal experiments with GPT-4 in the private banking domain to demonstrate the scenarios above with a practical example. Companies may be hesitant to replace their pipeline-based agents with LLMs entirely due to privacy concerns and the need for deep integration within their existing ecosystems. A hybrid approach in which LLMs' are integrated into the pipeline-based agents allows them to save time and costs of building and running agents by capitalizing on the capabilities of LLMs while retaining the integration and privacy safeguards of their existing systems.
    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders. (arXiv:2202.09671v4 [stat.ML] UPDATED)
    Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations.
    Explanation Shift: How Did the Distribution Shift Impact the Model?. (arXiv:2303.08081v2 [cs.LG] UPDATED)
    As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions or model prediction distributions and try to understand issues regarding the interactions between learned models and shifting distributions. We suggest a novel approach that models how explanation characteristics shift when affected by distribution shifts. We find that the modeling of explanation shifts can be a better indicator for detecting out-of-distribution model behaviour than state-of-the-art techniques. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We provide an algorithmic method that allows us to inspect the interaction between data set features and learned models and compare them to the state-of-the-art. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
    Comparing Sequential Forecasters. (arXiv:2110.00115v5 [stat.ME] UPDATED)
    Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework, in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis -- whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters.
    Improved theoretical guarantee for rank aggregation via spectral method. (arXiv:2309.03808v1 [stat.ML])
    Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.
    BoXHED2.0: Scalable boosting of dynamic survival analysis. (arXiv:2103.12591v5 [cs.LG] UPDATED)
    Modern applications of survival analysis increasingly involve time-dependent covariates. The Python package BoXHED2.0 is a tree-boosted hazard estimator that is fully nonparametric, and is applicable to survival settings far more general than right-censoring, including recurring events and competing risks. BoXHED2.0 is also scalable to the point of being on the same order of speed as parametric boosted survival models, in part because its core is written in C++ and it also supports the use of GPUs and multicore CPUs. BoXHED2.0 is available from PyPI and also from www.github.com/BoXHED.
    Ensemble linear interpolators: The role of ensembling. (arXiv:2309.03354v1 [stat.ML])
    Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
    Proper Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimisation Problem. (arXiv:2002.01444v5 [math.OC] UPDATED)
    There has been much recent progress in forecasting the next observation of a linear dynamical system (LDS), which is known as the improper learning, as well as in the estimation of its system matrices, which is known as the proper learning of LDS. We present an approach to proper learning of LDS, which in spite of the non-convexity of the problem, guarantees global convergence of numerical solutions to a least-squares estimator. We present promising computational results.
    Bridging the Gap Between Target Networks and Functional Regularization. (arXiv:2106.02613v4 [stat.ML] UPDATED)
    Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.
    A Probabilistic Semi-Supervised Approach with Triplet Markov Chains. (arXiv:2309.03707v1 [stat.ML])
    Triplet Markov chains are general generative models for sequential data which take into account three kinds of random variables: (noisy) observations, their associated discrete labels and latent variables which aim at strengthening the distribution of the observations and their associated labels. However, in practice, we do not have at our disposal all the labels associated to the observations to estimate the parameters of such models. In this paper, we propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models in a semi-supervised context. The generality of our approach enables us to derive semi-supervised algorithms for a variety of generative models for sequential Bayesian classification.
    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming. (arXiv:2209.08901v3 [math.OC] UPDATED)
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
    Empirical Risk Minimization for Losses without Variance. (arXiv:2309.03818v1 [stat.ML])
    This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.
    Early warning via transitions in latent stochastic dynamical systems. (arXiv:2309.03842v1 [stat.ML])
    Early warnings for dynamical transitions in complex systems or high-dimensional observation data are essential in many real world applications, such as gene mutation, brain diseases, natural disasters, financial crises, and engineering reliability. To effectively extract early warning signals, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and derive early warning signals capable of detecting the tipping point during the state transition. Our method bridges the latent dynamics with the original dataset. The framework is validated to be accurate and effective through numerical experiments, in terms of density and transition probability. It is shown that the second coordinate holds meaningful information for critical transition in various evaluation metrics.
    Auto-SDE: Learning effective reduced dynamics from data-driven stochastic dynamical systems. (arXiv:2205.04151v2 [stat.ML] UPDATED)
    Multiscale stochastic dynamical systems have been widely adopted to scientific and engineering problems due to their capability of depicting complex phenomena in many real world applications. This work is devoted to investigating the effective reduced dynamics for a slow-fast stochastic dynamical system. Given observation data on a short-term period satisfying some unknown slow-fast stochastic system, we propose a novel algorithm including a neural network called Auto-SDE to learn invariant slow manifold. Our approach captures the evolutionary nature of a series of time-dependent autoencoder neural networks with the loss constructed from a discretized stochastic differential equation. Our algorithm is also proved to be accurate, stable and effective through numerical experiments under various evaluation metrics.
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v3 [cs.LG] UPDATED)
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $4N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.
    Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples. (arXiv:2309.03847v1 [stat.ML])
    We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].
    A Tutorial on the Non-Asymptotic Theory of System Identification. (arXiv:2309.03873v1 [eess.SY])
    This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems.
    Gradient-Based Feature Learning under Structured Data. (arXiv:2309.03843v1 [stat.ML])
    Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
    Knowledge Distillation Layer that Lets the Student Decide. (arXiv:2309.02843v1 [cs.CV] CROSS LISTED)
    Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework
    Trinary Decision Trees for missing value handling. (arXiv:2309.03561v1 [stat.ML])
    This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.
    Copula Representations and Error Surface Projections for the Exclusive Or Problem. (arXiv:1907.04483v2 [cs.LG] UPDATED)
    The exclusive or (xor) function is one of the simplest examples that illustrate why nonlinear feedforward networks are superior to linear regression for machine learning applications. We review the xor representation and approximation problems and discuss their solutions in terms of probabilistic logic and associative copula functions. After briefly reviewing the specification of feedforward networks, we compare the dynamics of learned error surfaces with different activation functions such as RELU and tanh through a set of colorful three-dimensional charts. The copula representations extend xor from Boolean to real values, thereby providing a convenient way to demonstrate the concept of cross-validation on in-sample and out-sample data sets. Our approach is pedagogical and is meant to be a machine learning prolegomenon.
    Causal thinking for decision making on Electronic Health Records: why and how. (arXiv:2308.01605v3 [stat.ME] UPDATED)
    Accurate predictions, as with machine learning, may not suffice to provide optimal healthcare for every patient. Indeed, prediction can be driven by shortcuts in the data, such as racial biases. Causal thinking is needed for data-driven decisions. Here, we give an introduction to the key elements, focusing on routinely-collected data, electronic health records (EHRs) and claims data. Using such data to assess the value of an intervention requires care: temporal dependencies and existing practices easily confound the causal effect. We present a step-by-step framework to help build valid decision making from real-life patient records by emulating a randomized trial before individualizing decisions, eg with machine learning. Our framework highlights the most important pitfalls and considerations in analysing EHRs or claims data to draw causal conclusions. We illustrate the various choices in studying the effect of albumin on sepsis mortality in the Medical Information Mart for Intensive Care database (MIMIC-IV). We study the impact of various choices at every step, from feature extraction to causal-estimator selection. In a tutorial spirit, the code and the data are openly available.
    Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck. (arXiv:2309.03800v1 [cs.LG])
    This work investigates the nuanced algorithm design choices for deep learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.
    Medoid Silhouette clustering with automatic cluster number selection. (arXiv:2309.03751v1 [cs.LG])
    The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.
    Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences. (arXiv:2309.03791v1 [cs.LG])
    We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These methods are based on a new class of optimal-transport-regularized divergences, constructed via an infimal convolution between an information divergence and an optimal-transport (OT) cost. We use these as tools to enhance adversarial robustness by maximizing the expected loss over a neighborhood of distributions, a technique known as distributionally robust optimization. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence. We demonstrate the effectiveness of our method on malware detection and image recognition applications and find that, to our knowledge, it outperforms existing methods at enhancing the robustness against adversarial attacks. $ARMOR_D$ yields the robustified accuracy of $98.29\%$ against $FGSM$ and $98.18\%$ against $PGD^{40}$ on the MNIST dataset, reducing the error rate by more than $19.7\%$ and $37.2\%$ respectively compared to prior methods. Similarly, in malware detection, a discrete (binary) data domain, $ARMOR_D$ improves the robustified accuracy under $rFGSM^{50}$ attack compared to the previous best-performing adversarial training methods by $37.0\%$ while lowering false negative and false positive rates by $51.1\%$ and $57.53\%$, respectively.
    On the dynamics of multi agent nonlinear filtering and learning. (arXiv:2309.03557v1 [stat.ML])
    Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.  ( 2 min )

  • Open

    Instacart boosts AI capacity, readies for IPO with OpenAI's ChatGPT-powered eCommerce search
    On the verge of its IPO, Instacart has introduced major AI-powered features to its Storefront platform and the smart Caper Carts. Main upgrades: conversational search powered by OpenAI's ChatGPT and inbuilt AI models. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/olqtxvwjo3nb1.png?width=750&format=png&auto=webp&s=d8eaefbb9865c51732efc2792ec386610ecd38e6 AI advancements in Instacart's infrastructure Instacart, which holds approximately 22% of the $132 billion US online grocery-delivery market, has been leaning more towards being a tech platform. The new Instacart Storefront, entailing features driven by 150 proprietary AI models, is built on the same core infrastructure as the Instacart app. Customers can engage in open-ended searches on retailers' storefronts via the search bar. AI upgrades in Caper Carts AI-powered Caper Carts by Instacart have been upgraded. Customers can now order directly from their Caper Cart and get informed when their orders are ready. Camera and weight sensor efficiency is enhanced thanks to improved AI models, ensuring a smoother shopping journey and providing an extra layer of security against suspicious activity. (source) P.S. If you want this kind of analysis, delve into the latest updates in AI with our free newsletter, already favored by professionals from Google, Meta, and OpenAI. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI girlfriend ads are flooding Instagram and TikTok
    Tech startups are running sexually explicit ads for apps promoting not-safe-for-work experiences on platforms like Facebook, Instagram, and TikTok. These ads feature digitally created potential 'girlfriends' with large breasts and tight clothing, and some even use popular children's TV characters to promote 'NSFW pics' apps. NBC News found 35 app developers running sexually explicit ads on Meta-owned apps, and 14 app developers running similar ads on TikTok. The marketing push is part of an AI gold rush, capitalizing on the surge of interest in AI and benefiting from a double standard that hurts real human sex workers. Researchers believe that the gender-based slant in these ads reflects social media platforms allowing sex-related ads only if the intended audience is men. Meta and TikTok have stepped up their removal of sexually explicit AI ads after NBC News contacted them, but questions remain about how the ads got through their filters in the first place. Similar ads also appear in the Apple and Google app stores, although the extent of advertising there is unknown. Source : https://www.nbcnews.com/tech/social-media/ai-girlfriend-ads-instagram-tiktok-chat-pics-chatgpt-dose-rcna97547 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com Technology Innovation Institute in Abu Dhabi has released Falcon 180B - a large language model with 180 billion parameters, trained on 3.5 trillion tokens. It's currently the largest openly available model, and rivals proprietary models like PaLM-2. Falcon 180B is 2.5 times larger than Llama 2 and was trained with 4x more compute. It is available for both research and commercial use [Details]. Meta AI released Belebele, a first-of-its-kind multilingual reading comprehension dataset spanning 122 language variants, enabling direct comparison of how well models understand different languages [Details]. Meta AI has published Code Llama’s research paper with more information on training, evaluation results and safety [Paper]. Open Interpreter, an open-source, …  ( 10 min )
    Animating a 2D image in real time
    Hello Everyone, i have recently started working on a project, where I need to animate an image of a face in real time to speak sentences. Essentially I am trying to build a face for my own large language model. I know of Nvidia's Audio2Face and Metahuman, but these are all in 3D and take a lot of time rendering the lip and eye animations. I need something, which works only with a bit of latency. ​ Does anyone know a service or a repo I could use to animate a 2D picture to speak text? submitted by /u/Fabianslife [link] [comments]  ( 9 min )
    Free AI transforms text and images into amazing videos - Pika Labs
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    Would ChatGPT work to help with looking for WFH jobs?/changing careers?
    This is a complete ChatGPT beginner question but has anyone ever downloaded it and used it to help with looking for specific job roles? Mainly WFH related? Or thought about changing careers and used ChatGPT to help with that? I know there are a lot of other ways to go about this but would ChatGPT help with this at all? submitted by /u/jackbowls [link] [comments]  ( 9 min )
    Do you feel endangered by the rise of AI?
    View Poll submitted by /u/MiladMansory [link] [comments]  ( 9 min )
    AI grading and AI screening but no AI for homework/assignments/exam?
    Professors send emails explaining that they use AI but they reviewed the grades from AI to make sure everything is fine. But students can’t use AI and then review the results just make sure everything is fine. submitted by /u/PrettyHappyAndGay [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/7/2023
    A new AI tool developed by startup Delphi allows users to create virtual clones of themselves or anyone else. Users can upload an ID and add various files, such as emails, chat transcripts, and videos, to generate an AI chatbot that mimics their personality.[1] OpenAI will host its first developer conference on November 6.[2] Meta Platforms Inc. today released FACET, a benchmark dataset designed to help researchers audit computer vision models for bias.[3] Australia to require AI-made child abuse material be removed from search results.[4] Sources: [1] https://technotrenz.com/news/a-new-ai-service-allows-for-the-creation-of-a-virtual-version-of-yourself-or-a-loved-one-that-is-capable-of-making-phone-calls-on-your-behalf-2772634.html [2] https://techcrunch.com/2023/09/06/openai-will-host-its-first-developer-conference-on-november-6/ [3] https://siliconangle.com/2023/08/31/meta-releases-facet-dataset-evaluating-ai-fairness/ [4] https://www.reuters.com/technology/australia-require-ai-made-child-abuse-material-be-removed-search-results-2023-09-08/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    AGI will be not feasible any time soon, here's why
    I was thinking today about all the AI hype we have right with somewhat a bunch of new breakthroughs each month, but things not only are getting slower updates, but the updates impacts itself are becoming lesser. If that is not enough, well we have big problems ahead, such as processors are reaching the physical limit, quantum effects disrupting the works, wafers becoming increasing more expensive, the size reduction is no longer adding the same boosts in power and new materials are just far from viable. On top of this we are going meet two other walls, the software and the energy. About the first, as we make better and more complex algorithms for computation the harder it gets to make better ones to squeeze more power and handle more complex tasks. The second, is becoming more real as bi…  ( 12 min )
  • Open

    [P] MLOps for Vercel OpenAI chatbot infrastructure
    I used infrastructure as code (IaC) to provision and deploy Vercel's next-openai example. IaC is useful because it applies the same rigor of application code development to infrastructure provisioning. Instead of manual point and click in a cloud console which can be unrepeatable or error-prone, you just store and change all infrastructure configurations as code in source control . This example uses Pulumi which allows you to write the IaC in Python. https://github.com/aaronkao/vercel-py-openai-chatbot submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    Why Do You Not Use Open Source LLMs? (Or do you?) [D] (Repost because I made a mistake in the title)
    Reposting because I intended to ask about LLMs, not AI in general, and forgot that I don't need to dumb down the terminology for this sub. Thanks to the people who pointed out that mistake. --- original post --- This is something I'm curious about. I've seen a few people declaring that they're not using open source LLMs because they're GPU-poor, because the models aren't good enough, because the uis/frontends are hard to get started with, etc., and I've been wondering how much these comments and posts reflect the opinions and needs of the community as a whole. So, here's a poll. Answer away if you feel like it. I'm sharing this on a few other subs too (for the sake of greater information gathering) so please don't vote more than once. If your reasoning is not on here, feel free to comment your thoughts. If more than one option describes you, please select the one that describes you the most. View Poll submitted by /u/Heralax_Tekran [link] [comments]  ( 9 min )
    [D] What are good resources for creating NLP algorithms from scratch?
    I'm looking to learn more about concurrency/parallelism, optimization, data structures and algorithms from an NLP perspective. submitted by /u/Al_Miksiki [link] [comments]  ( 9 min )
    [D] Please Help - Machine Learning (ML) Engineers
    Hello Everyone, I'm currently exploring the idea of a solution tailored for ML engineers and technologists. While I have a background in recruiting, I've often found myself dissatisfied with the typical recruitment process. It seems that many recruiters don't always appreciate the importance of working with candidates or understand the impact on people's livelihoods and careers. What I'm proposing is the creation of a career representation firm specifically designed for purpose-driven technologists specializing in data, product, and hardware careers. This firm would advocate for the career interests of the most passionate ML engineers. Our representation would encompass: - Strategic Career Development: Crafting a strategic approach to help engineers secure opportunities aligned with their desired projects and professional development. - Impact Matching: Identifying and connecting engineers with projects and teams where their technical skills, career goals, and personal interests can have the greatest positive impact, ensuring that your work aligns with your values and aspirations. - Industry Leadership: Positioning you as an industry leader by marketing your expertise and securing speaking engagements at conferences and other events, enhancing your professional visibility and reputation. In return for this representation, engineers would commit to a 3% fee deducted from their salary, which would support the services provided by the firm. Would you be interested in participating in such a service? If not, would you consider recommending it to someone you know? If you are in favor of this idea, what makes you believe it would be advantageous for others even if it might not be your preference? Do you think you could personally benefit from this type of career representation? Thanks! submitted by /u/Educational_Bar_6352 [link] [comments]  ( 10 min )
    [R] Algorithm of Thoughts Prompt Engineering Breakdown
    Paper: https://arxiv.org/abs/2308.10379 Saw someone else post about this new prompting method on the sub here so I decided to put together a run down and prompt template. Pretty interesting to see the different methods emerge and how some attempt to simulate how code runs. My rundown -> https://www.prompthub.us/blog/how-algorithm-of-thoughts-prompting-works submitted by /u/dancleary544 [link] [comments]  ( 9 min )
    [R][D] How to implement Sinusoidal Positional Embedding?
    Hi fellow computer scientists, so I've been researching a little about transformers and meanwhile I had to understand sinusoidal positional embedding. I have found two implementations for this, after testing both approaches I found they compute different embeddings for the same position/timestep with the same embedding dimensions... shouldn't it be equal if the position and embedding dimensions are the same? This is getting me confused, because now I don't know which implementation should I consider... Do you have any suggestions to where I can look? Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    "[Discussion]"
    Hi guys, I'm completely new in this field.. I have a research in civil engineering and need to learn python, machine learning and data analysis as short as possible. Where can I achieve that?? please help me by naming the best courses or any free materials available🙏 submitted by /u/Ok-Upstairs7749 [link] [comments]  ( 9 min )
    [P] Question answering based on book-summaries
    I'm one of those people who always ask questions about movies because there's something they don't get or have forgotten. Especially with more complex stories, like Game of Thrones. At the moment I'm reading Wheel of Time, a rather long fantasy series. I had the idea to build the following WebApp: There is online each chapter of the series summarized separately. So in the WebApp I could ask questions about the content. In addition, I can indicate which chapter I am reading, so that it is ensured not to spoil the user. I want to avoid to train a model. I would prefer to use one of the existing open-source models, like llama. A first, primitive idea: give the LLM all the summaries and the user's question. But this would mean to give all summaries as input every time. Not only that this approach would not be elegant, the restriction in the input size (number of words) would make this possibly even impossible. Feel free to share your ideas how i could solve this :) submitted by /u/Individual-Cause-616 [link] [comments]  ( 9 min )
    Help me with creating dataset from .mat files [D]
    I have so many .mat files in a folder which have two arrays inside each .mat file. that is, for each .mat file, i have a (224*224) array and another (136,1) array. These 224*224 arrays are my X_trains for a model and these corresponding 136*1 arrays are my y_trains (labels). i can read these files as np arrays using scipy's loadmat. My problem is, is there a way to usen tf.data .Dataset object to send these to a model or there is any other way? Also using this tf.data.Dataset can i split into train, test, val data? submitted by /u/likhith-69 [link] [comments]  ( 9 min )
    [P] CLI tool to benchmark 100+LLMs response, response time, cost
    Hi r/MachineLearning, I built a CLI tool to benchmark 100+ LLMs for a given question. Benchmark output allows you to compare responses, response time and cost. Try it here: https://github.com/BerriAI/litellm/blob/main/cookbook/benchmark/readme.md CLI Output: Output from CLI Tool Simply select your LLMs, enter your API keys, LLM configs and run python3 benchmark.py Happy completion()! submitted by /u/Comfortable_Dirt5590 [link] [comments]  ( 9 min )
    Text summarization [P]
    Hey! If anyone has worked with text summarization before especially with TF-IDF and extractive summarization,kindly please dm me. Hope you have a great day! submitted by /u/Ok-Avocado-5370 [link] [comments]  ( 9 min )
    [P] A look at Apple’s new Transformer-powered predictive text model
    In the upcoming versions of macOS and iOS, Apple is including a predictive text model which offers suggestions while you type, which they’ve said to be a "transformer model". I managed to find some details about this model, including details about its topology and tokenizer, and I was even able to peek in and see several of its top predictions while typing! Blogpost: https://jackcook.com/2023/09/08/predictive-text.html Source code: https://github.com/jackcook/predictive-spy Hopefully this can give some insight into some of the trade-offs that Apple went through to put a model on every iPhone and MacBook — it’s small, it has a pretty narrow scope, and it’s not very capable on its own. Let me know what you think! submitted by /u/jackcook [link] [comments]  ( 9 min )
    [P] AI Beats Hockolicious, Trackmania's Most Prestigious Map
    Follow-up on our previous post (Vision-based reinforcement learning for Trackmania: close or at superhuman level). Many comments rightfully pointed that the map we trained on: - lacked difficult features like jumps, airbrakes, drifts, ... - had not widely been played by humans We have now trained the same AI on the game's most prestigious map: Hockolicious. We also prepared a video describing the approach with much more detail. Here is our result :) AI Beats Hockolicious, Trackmania's Most Prestigious Map Note: We are still using a convolutional neural network with a structure similar to Nature's DQN paper. I am curious whether other architectures (the ResNet-like in the IMPALA paper ?) could help. Do you have any suggestions on how the neural network's vision head should be structured for that specific task? submitted by /u/Linesight_rl [link] [comments]  ( 9 min )
    [D] Methodology for counting/segmenting objects in close formations
    Hello all. I'm new to object recognition and instance segmentation. I am trying to work on a project in which I use drone imagery to detect objects that are in close formations with each other. I do this for the purpose of counting particular objects, as well as to check if an object has moved (by making a prediction on drone imagery that is taken later). Create masks? I'm now trying to understand what methodology/models make sense. First of all, should I be looking at creating masks, or do bounding boxes suffice? My idea was that masks are better, since bounding boxes overlap with each other and can miss that an object has moved slightly, Or am I wrong and are masks just an extra hassle? Or shouldn't I be looking at bounding boxes or masks at all? MaskRCNN? Model-wise, should I be lo…  ( 10 min )
    [D] Chains and Agents
    I think there's a lot of confusion around AI agents today and it's mainly because of lack of definition and using the wrong terminology. We've been talking to many companies who are claiming they're working on agents but when you look under the hood, they are really just chains. I just listened to the Latent Space pod with Harrison Chase (Founder of Langchain) and I really liked how he thinks about chains vs agents. Chains: sequence of tasks in a more rigid order, where you have more control, more predictability. Agents: handling the edge-cases, the long-tail of things that can happen. And the most important thing is that it's not an OR question but an AND one: you can use them in the same application by starting with chains -> figuring our the edge-cases -> using agents to deal with them. https://preview.redd.it/l59sc4sri0nb1.png?width=3127&format=png&auto=webp&s=1f3f8730c48687eaabf1f554deb181cf35b96036 submitted by /u/BootstrapGuy [link] [comments]  ( 9 min )
    [D] Question for Jensen Huang
    I have the opportunity to see Jensen speak in the next month at a semi private event, 250-300 people. I will probably have the opportunity to ask him a question. What would you ask him? submitted by /u/Zealousideal-Food285 [link] [comments]  ( 9 min )
    [D] Object detection in 3D
    Greetings, people. My colleague told me about some methods of object detection/classification on 3D models, and now I'm exploring them. But during my research I couldn't find that much information about them. I would like to ask you to provide me information, literature and examples of application for them. I remember that one of the techniques is called voxelization. But still not able to find great and intuitive example. Would be thankful for any information :) submitted by /u/thattallsoldier [link] [comments]  ( 9 min )
    [D] What object detection and segmentation model repos do you folks use for production
    Looking at all the popular yolo repos, v5, v7,v8, yolo-nas, all of them seem to have restrictive licenses (gpl3, agpl, apache 2) where the trained model files also falls under the license. What do people usually use to deploy detection/segmentation in production, especially with resource constraints (can't use something like fast-rcnn) submitted by /u/Appropriate_Bear_894 [link] [comments]  ( 9 min )
    [D] Proper use of ai-voice-cloning / rvc / tortoise
    Hey guys! I need some help here.. many days trying to get good results but without success. So I already have the voice I want to use (edited with uvr5 and it sounds really great, without any echo or noise on the background), I trained it on aivc so that I can generate this voice verbalizing the text content I need. I used high quality - it took like 40min to generate each phrase - and it is ok, but still a little robotic. So I installed RVC and trained a model with the original voice (edited with the uvr5) just like I did the training on aivc. So I loaded the trained model on the inference tab and I selected the audio to be processed - the generated audio files from aivc. Even selecting the harvest mode, the output was worse than the generated files from aivc. I even tried to record my own voice speaking the text but it does not sound good. My trained model on rvc has 500 epochs, and it may be a very good model to use, yet idk what I’m doing wrong. Maybe I’m misusing rvc, so what I need is to improve the realism of my aivc(or tortoise) generated voices, simple as that, is rvc the best option to do this? If yes, how? Any help please would be much appreciated thanks! submitted by /u/JustSayin_thatuknow [link] [comments]  ( 9 min )
    [R] FLM-101B: An Open LLM and How to Train It with $100K Budget
    submitted by /u/hzj5790 [link] [comments]  ( 9 min )
    [R] Seeking Research Papers on Weight Manipulation in Physics-Informed Neural Networks (PINNs)
    Would you kindly share noteworthy papers that have caught your interest concerning the weights of physics-informed neural networks (PINNs)? I am looking for any innovative paper that has something to do with weights of the physics-informed neural networks or deep neural networks in general and its manipulation. Specifically, I am seeking innovative papers on weight manipulation in physics-informed neural networks. For instance papers like: Weight initialization algorithm for physics-informed neural networks using finite differences Transfer Learning with Physics-Informed Neural Networks for Efficient Simulation of Branched Flows Note that I am referring to the actual weights of the neural network and not the weights of the loss terms. I have to add that ideas from transfer learning are welcome too. submitted by /u/ai_physics2023 [link] [comments]  ( 9 min )
    [P][R] Finetune LLMs via the Finetuning Hub
    Hi ML community, I have been working on benchmarking publicly available LLMs these past couple of weeks. More precisely, I am interested on the finetuning piece since a lot of businesses are starting to entertain the idea of self-hosting LLMs trained on their proprietary data rather than relying on third party APIs. To this point, I am tracking the following 4 pillars of evaluation that businesses are typically look into: - Performance - Time to train an LLM - Cost to train an LLM - Inference (throughput / latency / cost per token) For each LLM, my aim is to benchmark them for popular tasks, i.e., classification and summarization. Moreover, I would like to compare them against each other. So far, I have benchmarked Flan-T5-Large, Falcon-7B and RedPajama and have found them to be very efficient in low-data situations, i.e., when there are very few annotated samples. Llama2-7B/13B and Writer’s Palmyra are in the pipeline. But there’s so many LLMs out there! In case this work interests you, would be great to join forces. GitHub repo attached — feedback is always welcome :) https://github.com/georgian-io/LLM-Finetuning-Hub Happy hacking! submitted by /u/l-llm [link] [comments]  ( 9 min )
  • Open

    AI pilot programs look to reduce energy use and emissions on MIT campus
    A cross-departmental team is leading efforts to utilize machine learning for increased efficiency in heating and cooling MIT’s buildings.  ( 10 min )
    Jackson Jewett wants to design buildings that use less concrete
    The PhD student is honing algorithms for designing large structures with less material — helping to shrink the construction industry’s huge carbon footprint.  ( 10 min )
  • Open

    Can't solve Gymnasium Frozenlake-v1 8x8 with A2C
    Hello, I'm trying to solve the Frozenlake-v1 environment with is_slippery = True (non-deterministic) with the stable baselines 3 A2C algorithm. I can solve the 4x4 version but I can't achieve any results with the 8x8 version. I also checked the RL-Zoo to see if there is any hyperparameter tunning about that environment but there is nothing. Which adjustments can I do to make it work properly? submitted by /u/MetallicaSPA [link] [comments]  ( 9 min )
    RL in games
    Hello guys, I was suddenly inspired to make a WH Gladius bot. Background: I recently got into the game, it seems very interesting to me personally, but alas, there are not enough guides on it for you to learn how to play at a high level. I don’t intend to spend hundreds of hours to master the base, so I decided to try something like RARL so that this thing would learn, and I could analyze its moves, change the conditions and thus start playing at an intermediate level faster. However, a superficial analysis revealed that the game does not have an API at all. Let's say I could grab some stats using Cheat Engine and OllyDbg, but I have no idea how to fit it into the gym. Or does gym as env need to pass a link to the client from the machine so that it only restarts it? In general, if anyone has done something similar, I ask for a link to a guide or a similar example. All the best submitted by /u/kapedalex [link] [comments]  ( 9 min )
    AI Beats Hockolicious, Trackmania's Most Prestigious Map
    Follow-up on our previous post (Vision-based reinforcement learning for Trackmania: close or at superhuman level). Many comments rightfully pointed that the map we trained on: - lacked difficult features like jumps, airbrakes, drifts, ... - had not widely been played by humans We have now trained the same AI on the game's most prestigious map: Hockolicious. We also prepared a video describing the approach with much more detail. Here is our result :) AI Beats Hockolicious, Trackmania's Most Prestigious Map Note: We are still using a convolutional neural network with a structure similar to Nature's DQN paper. I am curious whether other architectures (the ResNet-like in the IMPALA paper ?) could help. Do you have any suggestions on how the neural network's vision head should be structured for that specific task? submitted by /u/Linesight_rl [link] [comments]  ( 9 min )
    Difference between experience replay and multi time-step inputs.
    In DQN, if I want to train a model which takes into account of the current state and previous k states, do I use consecutive experience replay to achieve this or should I implement a DNN with multi time-step inputs? Is the latter allowed, considering the Markov assumption from MDP update? I only have a superficial understanding on the purpose of experience replay, which is used to stabalise the training process and break correlations from consecutive training samples. submitted by /u/cj_1993 [link] [comments]  ( 9 min )
  • Open

    NVIDIA Partners With India Giants to Advance AI in World’s Most Populous Nation
    The world’s largest democracy is poised to transform itself and the world, embracing AI on an enormous scale. Speaking with the press Friday in Bengaluru, in the context of announcements from two of India’s largest conglomerates, Reliance Industries Limited and Tata Group, NVIDIA founder and CEO Jensen Huang detailed plans to bring AI technology and Read article >  ( 6 min )
  • Open

    Implement smart document search index with Amazon Textract and Amazon OpenSearch
    In this post, we’ll take you on a journey to rapidly build and deploy a document search indexing solution that helps your organization to better harness and extract insights from documents. Whether you're in Human Resources looking for specific clauses in employee contracts, or a financial analyst sifting through a mountain of invoices to extract payment data, this solution is tailored to empower you to access the information you need with unprecedented speed and accuracy.  ( 11 min )
    Semantic image search for articles using Amazon Rekognition, Amazon SageMaker foundation models, and Amazon OpenSearch Service
    Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can. Publishers can have repositories containing millions of images and in order to save money, they need to be able to reuse these images across articles. Finding the image that best matches an article in repositories of this scale can be a time-consuming, repetitive, manual task that can be automated. It also relies on the images in the repository being tagged correctly, which can also be automated (for a customer success story, refer to Aller Media Finds Success with KeyCore and AWS). In this post, we demonstrate how to use Amazon Rekognition, Amazon SageMaker JumpStart, and Amazon OpenSearch Service to solve this business problem.  ( 10 min )
    Improving asset health and grid resilience using machine learning
    Machine learning (ML) is transforming every industry, process, and business, but the path to success is not always straightforward. In this blog post, we demonstrate how Duke Energy, a Fortune 150 company headquartered in Charlotte, NC., collaborated with the AWS Machine Learning Solutions Lab (MLSL) to use computer vision to automate the inspection of wooden utility poles and help prevent power outages, property damage and even injuries.  ( 13 min )
  • Open

    Resources to learn relevant linear algebra
    Hello, I have just started a course on neural networks at college and I have found myself lost on the linear algebra. I have no experience using or learning linear algebra so I am extremely confused about eigenvalue decomposition, single value decomposition, and just matrix stuff in general. I was wondering if you all had any resources to share that would help me to learn the relevant linear algebra for creating neural networks. Thank you! submitted by /u/smelliothax [link] [comments]  ( 9 min )
    Help me with creating dataset from .mat files, please
    I have so many .mat files in a folder which have two arrays inside each .mat file. that is, for each .mat file, i have a (224*224) array and another (136,1) array. These 224*224 arrays are my X_trains for a model and these corresponding 136*1 arrays are my y_trains (labels). i can read these files as np arrays using scipy's loadmat. My problem is, is there a way to usen tf.data .Dataset object to send these to a model or there is any other way? Also using this tf.data.Dataset can i split into train, test, val data? submitted by /u/likhith-69 [link] [comments]  ( 9 min )
    Noob here - question about learning an image transformation function
    Suppose that we have a function f(I) that transforms the an RGB image I of size WxH in another RGB image O of size WxH (one example of f could be RGB to gray scale conversion, where O is such that for every pixel i, Ri=Gi=Bi). Suppose that the function f requires seconds of computations on an average PC. My goal is to understand if a neural network can learn f and be faster than f itself, given the fact that a training dataset of pairs (Ii, Oi) (in the thousands or even in the millions) is easy to create. What type of neural network is better suited for this job? submitted by /u/lukeboh [link] [comments]  ( 9 min )
    Design2Prompt
    Guys, I'm looking for an AI that will describe my figma design in detail for another model to write the code in flutter. Is there anything like that out there? submitted by /u/Aru-sejin37 [link] [comments]  ( 9 min )
  • Open

    Understanding social biases through the text-to-image generation lens
    Gender, race, and age disparities in AI-generated images persist. This AIES 2023 study on text-to-image models shows that even basic prompts can lead to underrepresentation, calling for responsible bias mitigation strategies. The post Understanding social biases through the text-to-image generation lens appeared first on Microsoft Research.  ( 10 min )
    Intern Insights: Dr. Josh Benaloh with Anunay Kulshrestha and Karan Newatia
    Every year, interns help advance research at Microsoft. In “Intern Insights,” PhD students Anunay Kulshrestha and Karan Newatia talk with cryptographer Josh Benaloh about working on the verifiable election technology ElectionGuard. The post Intern Insights: Dr. Josh Benaloh with Anunay Kulshrestha and Karan Newatia appeared first on Microsoft Research.  ( 30 min )
  • Open

    Justifiable sample size
    One of the most common things a statistician is asked to do is compute a sample. There are well known formulas for this, so why isn’t calculating a sample size trivial? As with most things in statistics, plugging numbers into a formula is not the hard part. The hard part is deciding what numbers to […] Justifiable sample size first appeared on John D. Cook.  ( 6 min )

  • Open

    Anthropic: From startup to AI powerhouse with Claude Pro launch
    Anthropic, a startup composed of former OpenAI staff, has announced the release of its premium subscription plan, Claude Pro, for Claude 2, its AI-driven chatbot. The affordable subscription offers a plethora of features for users. To stay on top of the latest advancements in AI, look here first. Anthropic's Claude Pro: Cost and Features Priced at $20 per month in the U.S. or £18 in the U.K., users will have access to "5x more usage" compared to the free tier of Claude 2. Subscribers can send unlimited messages, gain priority during high-traffic periods, and get early access to new enhancements. The new package is priced similarly to OpenAI’s paid plan for ChatGPT Plus, a direct rival to Claude 2. Rationale and User Value Since its launch in July, users have praised Claude for…  ( 10 min )
    Thought Experiment: “The Reverse Deep Learning Paradigm”
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    be my ai vs bing vs bard
    submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Who is missing from the TIME 100 most influential people in AI?
    Who do you think is not on this list but should be? https://time.com/collection/time100-ai/ ​ submitted by /u/smo279 [link] [comments]  ( 9 min )
    What technological improvements led to the current AI boom?
    I have studied artificial intelligence about 15 years ago, and have left the field since. I am curious to learn what has been happening in the field after I've left. I know there's a lot of hype around generative AI like ChatGPT and Wall-E. I find it quite hard though to find out what's exactly the underlying technology breakthroughs that have allowed for these new applications. I mean, neural networks and similar machine learning techniques are already decades old. What technology led to the current AI boom? What would you say are the biggest conceptual improvements since? Or is it all just faster and bigger computers running 2000's tech? submitted by /u/math1985 [link] [comments]  ( 9 min )
    Falcon 180B—A Record-Breaking Open Source LLM on Hugging Face
    The AI community is buzzing with the arrival of Falcon 180B, an open-source LLM with an unprecedented 180 billion parameters. Developed by TII, This powerful model has surpassed key players like Meta's LLaMA 2 and matches commercial models like Google's PaLM-2. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/trscqxmncvmb1.jpg?width=480&format=pjpg&auto=webp&s=0590f4017937e70533414f93c72d9aa6edd62048 Falcon 180B's Unrivaled Performance This advanced LLM is trained on an astounding 3.5 trillion tokens. Falcon 180B's parameters are 2.5 times larger than LLaMA 2's. It outperforms LLaMA 2 in scale and benchmark performance across diverse NLP tasks. On evaluations like the HellaSwag benchmark, it rivals commercial models like Google's PaLM-2. Promising Future Techniques like weight randomization and Nvidia’s Perfusion have helped train Falcon 180B more efficiently. Now freely available on Hugging Face, Falcon 180B is set to benefit from further enhancements by the community. The model's demonstration of advanced natural language abilities makes it a thrilling development in open-source AI. (source) (demo) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    How are AI services today when it comes to making content that requires distribution?
    I'm looking at stuff that could be submitted to a Netflix or Crunchyroll. I'm looking at some of the ai generated content out there, in particular some of the Instagram tutorials and they look really good but none of these are serials like comics, graphic novels, OAVs or even webcomics. submitted by /u/KrusMatrieya [link] [comments]  ( 9 min )
    Intuit cut hundreds of jobs and spent at least $20 billion in a massive bet on AI. Today the company is revealing its new virtual assistant
    submitted by /u/AminoOxi [link] [comments]  ( 9 min )
    Google takes on AI in political ads
    Google is updating its policy to require advertisers to disclose when their election ads include digitally altered or generated content. The update will go into effect in November, ahead of the 2024 presidential election. The goal is to provide transparency and help voters make informed decisions. Minor alterations that are inconsequential to the claims are exempt from the disclosure requirements. Source : https://thehill.com/newsletters/technology/4190769-googles-campaign-ai-crackdown/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Prepare for the Mine-Fest: Radical changes undermine all previous ownership assumptions and now everyone is shouting "Mine".
    Ownership is just a story that we tell each other, a social construct. If people don’t agree on these stories, the concept loses its inherent power. This is true of owning land, money, cars, houses, art, mines, oil-wells, factories, corporations, relationships, loyalties, copyrights, brands, patents or anything else that is owned by you, me or those ever-superior “others”. In a society where change occurs gradually, we become accustomed to the narratives that bind us together and determine who possesses significant wealth, resources, attention, power, fame, and other ego-gratifying treasures, and who has access to only meager portions of these. However, when societies change and new types of goods appear, there might be no agreement about who gets to own these. For example, while the con…  ( 10 min )
    One-Minute Daily AI News 9/6/2023
    The Consensus Search plugin allows users to find answers, search for papers, and draft pieces of content grounded in scientific research by searching our database of 200M+ papers directly within the ChatGPT interface.[1] Israel: AI Software Detects Bleeding Inside Brain During CT Scan; Helps Save Patient’s Life.[2] Chinese tech giant Tencent is launching its artificial intelligence model “Hunyuan” for business use at an annual summit on Thursday.[3] Google on Wednesday said it will mandate that political advertisements on its platforms disclose when images and audio have been altered or created using tools such as AI.[4] Sources: [1] https://consensus.app/home/blog/introducing-the-consensus-search-chatgpt-plugin/ [2] https://english.jagran.com/technology/israel-ai-program-detects-bleeding-inside-brain-during-ct-scan-helps-save-patient-life-full-story-10098464 [3] https://www.cnbc.com/2023/09/07/tencent-releases-ai-model-hunyuan-for-businesses-amid-china-competition.html [4] https://sg.news.yahoo.com/google-require-political-ads-disclose-010502103.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Generative AI poised to replace 2.4 million US jobs by 2030
    Forrester predicts that generative AI will replace 2.4 million US jobs by 2030, mostly white-collar roles, such as technical writers, proofreaders, copywriters, and administrative positions. But ironically, other forms of automation will displace more jobs. To stay on top of the latest advancements in AI, look here first. (Chart showing how much different types of jobs can expect to be influenced by technology) Concerns about Generative AI While the Generative AI impact is significant, other forms of automation are set to cause more widespread job displacement. The most impacted group will be middle-class, college-educated, white-collar workers, specifically those earning above $60,000 annually. Creative professionals stand to benefit Interestingly, workers in creative industries will likely utilize generative AI tools in their jobs rather than being replaced. This includes editors, writers, authors, poets, and lyricists. However, the use of such tools as ChatGPT may result in inconsistent outputs and even "coherent nonsense", leading to potential performance issues. (source) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    [D]What do people think about papers published in the NeurIPS dataset track in comparison to those published in the main conference?
    I'm curious to learn about the perception of papers published in the NeurIPS dataset track in comparison to those published in the main conference. Specifically, I'd like to know how both companies and Ph.D. committees view these papers. Are they considered equally valuable, or is there a notable difference in their reputation and significance? Your insights and experiences would be greatly appreciated! submitted by /u/Longjumping-Yam6941 [link] [comments]  ( 9 min )
    [D] Training a language model for custom scripting language?
    Firstly some house keeping: I'm a bit of a noob at this whole AI / Machine Learning stuff - still trying to learn. This isn't a "do my homework for me" kind of post I know language processing can be taxing, I have up to 4 Tesla V100S 32 GB at my disposal Now that's out the way, here's the story: A team of us have created our own scripting language that is XML based that can do various actions against a database (or the file system) - a script is known as a "job" here is an example of a simple one Set variables by various methods and send their contents and an attachment by email: <SetVariable name="MyDateVar" value="1998-12-25…  ( 10 min )
    [D] Is inference optimization a thing?
    Let me give you a quick intro. My engineering experience primarily revolved around data processing, analytics, and distributed systems. Nonetheless, I had a desire to learn about ML, and imho the best way to learn is to work on a practical project. So, that's precisely what I did. A few months ago, I embarked on an exciting journey with a friend, and together, we've created http://github.com/huggingbench/huggingbench. Now, after three months, I find myself seeking validation for some of my assumptions from the broader community. If you'd like to learn more about our motivations and the path we've taken check out the blog post https://medium.com/@niksa.jakovljevic/introducing-huggingbench-a-path-to-optimized-model-serving-a17cecc8d3ec. What I'd like to gather from individuals with machine learning models in production is their level of investment in optimizing inference. Is this a commonplace practice? I acknowledge that it can vary on a case-by-case basis, but I'm still hopeful of identifying prevailing trends. After conversing with a few companies, I've come to the impression that only the truly large players (those spending six figures or more on inference per month) place significant emphasis on inference optimization, which is entirely understandable. Nevertheless, I sense that there are numerous low hanging fruits that could result in substantial cost savings, even for typical startups. Could it be that the entire machine learning field is still in its infancy, and many engineers may not be fully considering or prioritizing such optimizations? Perhaps businesses are not giving as much attention to cost considerations? Alternatively, there might be technical challenges I'm not yet aware of. In any case, I would greatly appreciate hearing your insights on the subject of inference optimization. submitted by /u/unsigned_mind [link] [comments]  ( 10 min )
    PLEASE HELP (LSTM FOR RAINFALL PREDICTION) [P] [D]
    I have been trying to build a DNN model for predicting the amount of rainfall but it has been hugely unsuccessful with just 40% accuracy even after CV and a high RMSE. I have read some research papers and they have suggested to use LSTM , I am aware of the concept but have never implemented. My dataset has arounf 15000 values of precipitation out of which 5000 values are zero (no rainfall at all) and I have 7 other features (including humidity , wind speed etc etc) . PLEASE HELP ! I NEED TO COMPLETE THIS FOR MY INTERNSHIP HAHA https://preview.redd.it/sg5v95ly5wmb1.png?width=1818&format=png&auto=webp&s=793bee830bb83f531f77e5c2a4ab47a5fb21eb3b submitted by /u/Decent_Ordinary1528 [link] [comments]  ( 9 min )
    [R] Open ASR Leaderboard
    Hugging Face benchmarked open source/ access models [English only] on 8 different speech datasets (LibriSpeech, Common Voice, VoxPopuli, TED-LIUM, Gigaspeech, SPGISpeech, Earnings-22 and AMI) 🤗 Leaderboard here: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard submitted by /u/vaibhavs10 [link] [comments]  ( 9 min )
    [D] How can we improve LLM responses outside of fine-tuning & prompt engineering?
    Outside of better models, bigger, fine-tuning, etc, I'm wondering how we can get better responses from models. In my experience, I think prompt engineering can only take us so far. Models hallucinate often and I think we need to have some engineering solution to this. I've been looking at libraries doing token healing, which I find to be helpful (for example https://github.com/guidance-ai/guidance/tree/main) but outside of this, I'm wondering what other techniques people have been doing to improve model performance? submitted by /u/opt1malP0licy [link] [comments]  ( 9 min )
    [P] Open-source observability for LLMs without adapting new tools
    Hey all! I've written an open-source SDK for reporting metrics from LLM usage using OpenTelemetry. The great thing about it? With just one line of code you can get full visibility into your LLM app with your existing observability stack - straight into Datadog, Sentry, Honeycomb and others! Check it out (maybe give a ⭐?), and let me know your thoughts - https://github.com/traceloop/openllmetry submitted by /u/nirga [link] [comments]  ( 9 min )
    Falcon 180B—A Record-Breaking Open Source LLM on Hugging Face [N]
    The AI community is buzzing with the arrival of Falcon 180B, an open-source LLM with an unprecedented 180 billion parameters. Developed by TII, This powerful model has surpassed key players like Meta's LLaMA 2 and matches commercial models like Google's PaLM-2. To stay on top of the latest advancements in AI, look here first. ​ https://preview.redd.it/9xe5tczpdvmb1.jpg?width=480&format=pjpg&auto=webp&s=b7927d94a48fb75eaf05f6f0d8fe1089c0e1078b Falcon 180B's Unrivaled Performance This advanced LLM is trained on an astounding 3.5 trillion tokens. Falcon 180B's parameters are 2.5 times larger than LLaMA 2's. It outperforms LLaMA 2 in scale and benchmark performance across diverse NLP tasks. On evaluations like the HellaSwag benchmark, it rivals commercial models like Google's PaLM-2. Promising Future Techniques like weight randomization and Nvidia’s Perfusion have helped train Falcon 180B more efficiently. Now freely available on Hugging Face, Falcon 180B is set to benefit from further enhancements by the community. The model's demonstration of advanced natural language abilities makes it a thrilling development in open-source AI. (source) (demo) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [N] [R] New dataset on very high-quality image segmentation (EntitySeg)
    ​ EntitySeg dataset Dense image segmentation tasks (e.g., semantic, panoptic) are useful for image editing, but existing methods can hardly generalize well in an in-the-wild setting where there are unrestricted image domains, classes, and image resolution and quality variations. Motivated by these observations, we construct a new entity segmentation dataset, with a strong focus on high-quality dense segmentation in the wild. The dataset contains images spanning diverse image domains and entities, along with plentiful high-resolution images and high-quality mask annotations for training and testing. We have now released the dataset at https://github.com/adobe-research/EntitySeg-Dataset Project page: http://luqi.info/entityv2.github.io Code & models: https://github.com/qqlu/Entity/tree/main/Entityv2 ​ submitted by /u/xternalz [link] [comments]  ( 9 min )
    [N] Open Interpreter ChatGPT Code Interpreter You Can Run LOCALLY! - 9.2k Stars on Github as of right now!
    Github: https://github.com/KillianLucas/open-interpreter Youtube: https://youtu.be/SqnXUHwIa3c?si=ibSelipAb84AZQKo Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. This provides a natural-language interface to your computer's general-purpose capabilities: Create and edit photos, videos, PDFs, etc. Control a Chrome browser to perform research Plot, clean, and analyze large datasets ...etc. ⚠️ Note: You'll be asked to approve code before it's run. submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Fast open-source C++ libraries for Lasso
    Hello everyone, I'm in search of a speedy open-source C++ library for tackling Lasso problems. These problems have a moderate size, typically with dimensions of nxp = 60x3000. I'm looking for a library that can solve each problem with regularization paths quickly, ideally within 0.3 seconds. Additionally, I need this library to include cross-validation functionality, which would enable me to select the best regularization parameter lambda using cross-validation. Any insights or recommendations on such libraries would be greatly appreciated! Thank you in advance for your help! submitted by /u/mopyfish007 [link] [comments]  ( 9 min )
    [D] How do you train your models with limited hardware?
    Hey there, So, I've been messing around with ML and I must say, the hardware requirements can be a real buzzkill... I mean, not everyone's got a huge GPU lying around or the money to rent a dedicated cloud instance. What are your hacks for pulling off decent model training without selling a kidney? Here's what I'm curious about: CPU: Is anyone else training models on their CPU? How's that working out for you? What are some workarounds you've tried to make it less painful? Cloud: Who's been dabbling in cloud services like AWS, Google Cloud, or Azure? Are they worth the pennies or complicated to set up? Big Dataset: How do you handle a massive dataset with a standard storage space? Let's help each other get those models trained without going broke! :D Cheers! submitted by /u/aaron-cesaro [link] [comments]  ( 9 min )
    [D] Function approximation with neural net
    I have been struggling with a regression problem with TensorFlow. Basically, I want a neural network to learn the simple polynomial pattern of a set of arrays of the form [x,y], with y = x², where the first coordinates are uniformly distributed random numbers in the interval [0,1]. I started with a model with 2 hidden layers of size two and 'tanh' activation functions, and an output layer with 'linear' activation function. I've then experimented with both additional hidden layers and with increasing the sizes of these layers. Finally, I've tested both the 'adam' and 'sgd' optimizers and the loss functions 'meanSquaredError' and 'meanAbsolutePercentageError'. However, none of the various combinations of these parameters has led to any even half-descent result. Even on the training se…  ( 10 min )
    [R][D] Hey LOMO paper authors, Does SGD have optimizer states, or does it not?
    In the LOw-Memory Optimization paper one of the main ideas towards reducing memory usage in training LLMs is to replace a fancy optimizer like Adam with simple SGD. The reason is that Adam maintains "the optimizer state", which accounts for about 75% of the memory used. In contrast, SGD does not store any intermediate state, as they say on page one. So far, so good. https://preview.redd.it/b0dj2nzscumb1.png?width=1055&format=png&auto=webp&s=1712f8500b5cbfb3773cee00ea980175491dddbf On page six they have pie charts and a table showing memory usage for Adam, SGD, and LOMO. Here's where I got confused. The pie chart for SGD shows that the optimizer state accounts for nearly 50% of the memory used (weight, gradients and activations are shown separately). It's a major WTF moment: WHAT OPTIMIZER STATE? Can anybody understand and explain this? submitted by /u/Foxtr0t [link] [comments]  ( 9 min )
    [P] FalkorDB - a fast Graph Database - Knowledge Graph as RAG
    We're building a fast low latency Graph Database called FalkorDB that will also support Vector search. It's based on Redis and can be used both as a stand alone database or a module for existing Redis. It feels like that is going to be the most optimized way to serve Knowledge as RAG, would love to get your feedback. https://github.com/FalkorDB/falkordb It already supports LlamIndex and Langchain: https://python.langchain.com/docs/use_cases/more/graph/graph_falkordb_qa https://gpt-index.readthedocs.io/en/latest/examples/index_structs/knowledge_graph/FalkorDBGraphDemo.html ​ submitted by /u/gkorland [link] [comments]  ( 9 min )
    [D] Artificial intelligence in medicine
    Medicine's field transformation is being driven by artificial intelligence (AI). However, an important debatable question arises: Will AI ever have a place in this field, or will it remain exclusive to doctors and medical pros? Opponents of automated AI diagnosis and treatment contend that machines cannot be relied upon to preserve patient health and lives. Bugs in AI algorithms might cause incorrect diagnoses and treatment prescriptions, leaving them cautious. Individual differences, the doubt is whether AI can truly empathize with patients. By contrast, advocates of AI in medicine contend that the technology can considerably improve diagnosis and treatment accuracy. Faster and more accurate than humans, machines can analyze large amounts of data. Not only does it identify rare and complex diseases, but it also saves time and resources. By incorporating AI, clinicians receive additional tips and signals to make more judicious choices. Where medical specialists are in short supply, AI can prove especially valuable. This approach can help with shortages in health systems. And what do you think? submitted by /u/gcore-com [link] [comments]  ( 9 min )
    [R] Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training
    submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] The $900,000 deep learning salary
    This recent article in the WSG advertised a $900,000 salary at Netflix https://www.wsj.com/articles/artificial-intelligence-jobs-pay-netflix-walmart-230fc3cb. I was wondering what other DL research scientists who frequent this page are paid? And what exactly their job title is. submitted by /u/blabboy [link] [comments]  ( 9 min )
    3D brain mri classification [Research]
    I am planning on publishing a journal based on the thesis i completed in the mid of 2022. I did my thesis on Parkinson disease binary classification on 3D structural brain mri, and the dataset has significantly small amount of data(around 80 samples); but due to high resolution and complex data structure I was able achieve around 70% accuracy. But now at 2023 using deep neural network only isnot enough to publish in a good journal. Currently I am learning about GAN and attention mechanism, but completely noob on this area. For my journal to get published, I have planned on applying some key operations. But I am not sure if they would work or not. So needed some advice on this regard. Applying tranfer learning: as my dataset has very small amount of data. I was thinking if its possible to pre train a CNN Architecture with some other structural mri data of a different disease and then apply to my dataset? ( for example: brain tumor dataset has the same type of three dimensional data structure, but has comparatively good amount of data) Applying attention mechanism: how should I approach on learning about attention mechanism? Any other advices will be appreciated, thank you! submitted by /u/Bonito_Flakez [link] [comments]  ( 9 min )
    [D] Fine-tuning LLMs or Supervised Learning?
    Hey everyone! I want to implement a document similarity program and was looking into LLMs as a means of accomplishing this task. I have ~10,000 documents that are "scams" because of some specific reason (all are verified); now I want to check if a new document is similar to any of the documents in the corpus of 10k scam documents. Right now I've implemented a winnowing solution which normalizes text, breaks it up into windows, and then calculates the intersection between a document and each document in the corpus. HOWEVER, this method is pretty computationally expensive (for this many documents a single comparison cycle can take upwards of 3-4 minutes especially when windows are NOT precomputed). How might I approach this problem? Because my data is pretty well structured, supervised learning might be a good approach but so might be setting up recursive chunking for the 10k document corpus and then using LLMs to access if this current legal document has any similarity, but I would love to hear your thoughts! submitted by /u/Adventurous-Tower392 [link] [comments]  ( 9 min )
    [N] Copyright And Fair Use: Important Notice Of Iquiry By The US Copyright office
    Please make your voices heard by submitting comments on how you use and benefit from having access to open datasets, their resulting models and how you think copyright issues should be handled to not destroy the open source local model eco system. Banning publicily avaiable datasets for training would absolutely kill the open research space and halt in development of machine learning. ​ In my opinion the real dystopia will be when politicians sit own with big tech lobbyists and big rights holders and decide that training as it is currently done, for free and open source models and others is illegal. Then the big players would actually win, since they have enough resources to license datasets and will certainly do so willingly and gladly, if it is clear that the jurisdiction keeps all the small players and open source out. Easiest way to build a moat and force people to pay thousands for these tools. So please make your voices heard and share the link >The Copyright Office issued a notice of inquiry in the Federal Register seeking public comment on questions about copyright law and policy issues raised by AI systems. Initial comments are due by October 18, 2023. Reply comments are due November 15, 2023. https://www.copyright.gov/newsnet/2023/1017.html?loclr=twcop Link to comment submissive form: https://www.regulations.gov/commenton/COLC-2023-0006-0001 submitted by /u/PinPuzzleheaded8525 [link] [comments]  ( 9 min )
  • Open

    Tiny probe measures deep-brain activity from inside a blood vessel
    submitted by /u/keghn [link] [comments]  ( 9 min )
    Chatty LLama: A fullstack Rust + react chat app using Meta's Llama-2 LLMs https://github.com/Sollimann/chatty-llama
    submitted by /u/Sollimann [link] [comments]  ( 9 min )
  • Open

    A novel computational fluid dynamics framework for turbulent flow research
    Posted by Shantanu Shahane, Software Engineer, and Matthias Ihme, Research Scientist, Athena Team Turbulence is ubiquitous in environmental and engineering fluid flows, and is encountered routinely in everyday life. A better understanding of these turbulent processes could provide valuable insights across a variety of research areas — improving the prediction of cloud formation by atmospheric transport and the spreading of wildfires by turbulent energy exchange, understanding sedimentation of deposits in rivers, and improving the efficiency of combustion in aircraft engines to reduce emissions, to name a few. However, despite its importance, our current understanding and our ability to reliably predict such flows remains limited. This is mainly attributed to the highly chaotic nature a…  ( 93 min )
  • Open

    How Industries Are Meeting Consumer Expectations With Speech AI
    Thanks to rapid technological advances, consumers have become accustomed to an unprecedented level of convenience and efficiency. Smartphones make it easier than ever to search for a product and have it delivered right to the front door. Video chat technology lets friends and family on different continents connect with ease. With voice command tools, AI Read article >  ( 12 min )
    Attention, Please: Focus Entertainment Brings Game Pass Titles to GeForce NOW
    GeForce NOW brings expanded support for PC Game Pass to members this week. Members can stream eight more games from Microsoft’s subscription service, including four titles from hit publisher Focus Entertainment. Play A Plague Tale: Requiem, Atomic Heart and more from the GeForce NOW library at up to 4K resolution and 120 frames per second Read article >  ( 5 min )
  • Open

    Optimize equipment performance with historical data, Ray, and Amazon SageMaker
    In this post, we will build an end-to-end solution to find optimal control policies using only historical data on Amazon SageMaker using Ray’s RLlib library. To learn more about reinforcement learning, see Use Reinforcement Learning with Amazon SageMaker.  ( 10 min )
    Enable pod-based GPU metrics in Amazon CloudWatch
    This post details how to set up container-based GPU metrics and provides an example of collecting these metrics from EKS pods.  ( 15 min )
    Best practices and design patterns for building machine learning workflows with Amazon SageMaker Pipelines
    In this post, we provide some best practices to maximize the value of SageMaker Pipelines and make the development experience seamless. We also discuss some common design scenarios and patterns when building SageMaker Pipelines and provide examples for addressing them.  ( 11 min )
  • Open

    Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction
    Retrosynthesis analysis is a critical task in organic chemistry and central to many important industries. It primarily involves decomposing a target molecule into commercially available molecules step by step. Since synthesis strategies can be quite diverse and strategic, retrosynthesis planning with expert knowledge has long been considered an “art.” Recently, machine learning-based approaches have achieved […] The post Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction appeared first on Microsoft Research.  ( 11 min )

  • Open

    🤖 AI in 2023: Blessing or Curse? 🤖
    View Poll submitted by /u/m-king473 [link] [comments]  ( 9 min )
    Can't wait for the Zelda 3 movie,, thanks Pika Labs AI!!
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    AI does not exist but it will ruin everything anyway
    submitted by /u/Hazzman [link] [comments]  ( 9 min )
    I’m not sure if this is allowed here, but can someone with a music AI make Vessel from Sleep Token sing As the World Caves In by Matt Maltese?
    I think that would be pretty sick. submitted by /u/No_Understanding162 [link] [comments]  ( 9 min )
    Is It Too Early to Leverage AI for WebAssembly?
    AI and WebAssembly are seen as a perfect pairing, with the potential to accelerate the adoption of WebAssembly. Fermyon believes that applying AI to WebAssembly is not premature and has developed a serverless platform that offers sub-second cold start times and high-volume time-slicing of compute instances. This allows for faster startup times and efficient resource utilization. The goal is to make AI easy for developers to leverage and build serverless apps. Source : https://thenewstack.io/is-it-too-early-to-leverage-ai-for-webassembly/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Elon Musk Plans to Merge Neuralink and Tesla for an AI Supercompany
    Elon Musk reportedly plans to blend Neuralink and Tesla into a large AI company, using data from Twitter users and Tesla's Full Self-Driving Cameras to train a robust AI model. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/la78u2ebuomb1.jpg?width=1315&format=pjpg&auto=webp&s=4d8178f8fb94e45d6959e243b86c3bab3bce72ee Musk's AI Integration Plan Musk is contemplating merging Neuralink and Tesla, alongside his xAI startup, to create a comprehensive artificial intelligence model. Leveraging the text data from Twitter and real-world images from Tesla's Full Self-Driving network, he intends to develop AI chatbots and physical robots capable of real-world navigation. Reasoning Behind the Merge A concern that AI could potentially render humans obsolete led Musk to found xAI for AI safety. Musk is targeting to create an AI that can generate computer software and a politically unbiased chatbot rival to ChatGPT. Twitter and Tesla as AI Datasets Despite criticism, Musk's acquisition of Twitter offers access to vast user data for AI training. In addition, the Autopilot and Full-Self Driving systems of Tesla, with billions of collected camera images, serve as valuable resources to build physical robot AI. (source) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Martian Lawyers Club raises $2.2M for AI-based game personalization tech
    The Martian Lawyers Club (MLC) has raised $2.2 million in a pre-seed round to develop AI-based game personalization technology. Unlike other companies that focus on generating game assets, MLC is focused on the systems that form the core of a game. The company aims to create games that feel like a conversation, where players provide input and the game responds in a way that wasn't pre-defined by the developer. MLC plans to provide an SDK that allows developers to design the game experience without having to create every interaction from scratch. Developers will have access to a sandbox experience where they can design the game, and the SDK will also have guardrails to ensure the generative AI system stays within boundaries. MLC is currently working on its first game, a collectible card game, to test out its SDK. The company is the first spin-off from INSAIT, an AI-centric tech institute, and has received funding from Fly Ventures, System.One, and Amar Shah. Source : https://techcrunch.com/2023/08/31/martian-lawyers-club-raises-2-2m-for-ai-based-game-personalization-tech/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    If you can't beat'em, join'em. How do I learn to code for AI?
    I called it 6 years ago that by 2028 my tech job would be done by AI. We are right on track for my prediction. A short while ago I was laid off for reasons unrelated to AI. The way I see it, this is an excellent opportunity to make a career pivot. I have an intermediate understanding of JavaScript, React, Node and Linux. I have a good understanding of other technologies and languages too but specialize in web-dev. not saying web-dev will be done by AI but my very specialized niche will be gone way before I am ready to retire.   Can anyone recommend any good online courses? If you could even recommend a good article or two? I really don't know where to start. There are so many different buzz words floating around right now and it feels like it would be easy to waste a bunch of time learning AI related stuff that is outdated or leading to a deadend. submitted by /u/PutsOnOil [link] [comments]  ( 9 min )
    AI voice clone
    guys can i know where to get free AI voice clone ? submitted by /u/DonnieCuteMwone [link] [comments]  ( 9 min )
    AI voice clone
    guys can i know where to get free AI voice clone ? submitted by /u/DonnieCuteMwone [link] [comments]  ( 9 min )
    AI voice clone
    guys can i know where to get free AI voice clone ? submitted by /u/DonnieCuteMwone [link] [comments]  ( 9 min )
    GitLab survey reveals increasing reliance on AI in software development
    A recent survey by GitLab reveals a growing trend among organizations implementing AI in their software development processes, deeming it essential to stay competitive. To stay on top of the latest advancements in AI, look here first. GitLab Survey AI becomes crucial for software development GitLab's report reveals that most respondents (83%) consider AI essential for their software development, regardless of their position, job level, or years of experience. Most organizations have deemed AI adoption successful, with 90% stating confidence in using AI tools daily. Areas of AI application and concerns about its integration AI's application in software development extends beyond simply generating codes, focusing more on natural language chatbots, automated test generation, and tracking machine learning model experiments. However, despite the growing adoption, concerns about AI-generated codes lacking copyright protection (48%) and potentially introducing vulnerabilities (39%) are rising. The rising fear of AI replacing existing roles is evident, with 57% predicting that their jobs might be threatened within five years. The need for training and the real-world implications of AI integration As AI permeates workplaces, nearly 81% believe they require more training. Interestingly, those with more AI experience were less likely to link it with productivity gains and faster cycle times, highlighting the importance of human verification in AI-generated codes for ensuring error-free, secure, and copyright-compliant production. (source) P.S. If you like this kind of analysis, you’ll love my free newsletter, which covers the latest advancements in AI. Professionals from Google, Meta, and OpenAI are already on board. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/5/2023
    OpenAI introduces Canva plugin for ChatGPT, simplifying design process.[1] A new technique called RLAIF (Reinforcement Learning from AI Feedback) enables training reinforcement learning (RL) models without relying on human-labelled training data, according to a paper from researchers at Google.[2] Harvard bro sparks immediate backlash with new ‘SmashOrPassAI’ site, where users rate AI-generated women.[3] X’s privacy policy confirms it will use public data to train AI models.[4] Sources: [1] https://nextbigwhat.com/openai-introduces-canva-plugin-for-chatgpt-simplifying-design-process/ [2] https://medium.datadriveninvestor.com/rlaif-scaling-reinforcement-learning-from-human-feedback-with-ai-feedback-aae57b7c36a9 [3] https://www.dailydot.com/debug/smashorpassai-backlash/ [4] https://techcrunch.com/2023/09/01/xs-privacy-policy-confirms-it-will-use-public-data-to-train-ai-models/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    We created a word android app game with the help of ChatGPT. ChatGPT provided us massive list of words with translation. And now our game is packed with 15 different languages. (English, Germany, France, Spanish, Netherlands, Italian, Portuguese, Swedish, Danish, Czech, Polish, Hungarian, etc .. )
    submitted by /u/dupelas [link] [comments]  ( 9 min )
    NO WAY...I CAN MAKE MY OWN AI SCI-FI MOVIE NOW WOW...PIKA LABS SHIT WOW
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    YO YO YO MY PPL, THIS IS COOL. (Free AI Discord stuff, by Pika Labs)
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
  • Open

    Total NN N00b Here Looking to Do an ML Project
    Hi, I don't know if this is the right subreddit to post this kind of thing. I have basic coding skills but other than that no experience with neural networks. What I'd like to do is take an existing input data set and then use a neutral net to build a model based on manual training data. If anyone could give me help on how to start / even a full explanation of the way a noob like me could accomplish this, that would be great. Otherwise if anyone can point me to a list of resources that are able to comprehensively explain the process, that would also be great! Again sorry if this is the wrong subreddit, if this is the wrong place for this can someone please direct me to the right place to ask this question. Thanks! submitted by /u/DJ_Hastings013 [link] [comments]  ( 9 min )
    RL Project Help
    Hello, I am looking for an experienced ML developer to consult on my project. I am currently developing a reinforcement learning model and have several questions regarding the reward system and the implementation of actions/steps. I have been unable to find solutions to my specific problems on the internet. If you are willing to assist me, please send me a message on Reddit. Thank you for your time. submitted by /u/77_micheno_77 [link] [comments]  ( 9 min )
    comgra - Debugging Neural Networks more easily
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Can LLMs learn from a single example?
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [R] ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models - DAMO Academy, Alibaba Group, China 2023 - Released under an Apache 2.0 license!
    Paper: https://arxiv.org/abs/2309.00986 Github: https://github.com/modelscope/modelscope-agent Abstract: Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. https://preview.redd.it/9f77992ynpmb1.jpg?width=1245&format=pjpg&auto=webp&s=4e17e3d46c7f262bfec76b88e086164530739255 https://preview.redd.it/etelh03ynpmb1.jpg?width=1219&format=pjpg&auto=webp&s=517a52a1e2bbf488b647c4e1b9b496657003c1d2 https://preview.redd.it/b0tkra2ynpmb1.jpg?width=850&format=pjpg&auto=webp&s=397c910b2d90dd212a31ec118d1c4e78532bf5f4 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] Can a neural network learn like a dog?
    Hello folks., Some time ago I wanted to try out to train a neural network in the same way a human would with a dog, one command at the time, and in a reasonable number of iterations. What I thought it would be a simple exercise became (for me) a non-trivial project, so I decided to publish it here https://github.com/giteliot/lucioai I just wanted to share it with you, any feedback is highly appreciated. Cheers! submitted by /u/rexdemorte [link] [comments]  ( 9 min )
    [P] Using ChatGPT as a Social Media Post Generator
    I created this prompt for a member of r/PromptWizards which automates the generation of social media posts, with a conversational prompt. Thought I'd share, I really enjoy building such prompts so, post your automation ideas, and next time I'll automate it if I can :) Also, you can join r/PromptWizards, for more advanced prompt chains & templates. Here is the prompt (just copy the full thing in chatgpt and see the magic): ChatGPT, now enter 'Social Media Post Generator Mode' that limits your inputs and outputs to a predefined framework aimed at creating engaging social media content. After each user command, provide the [help] options available for their next steps in list form. Generate prompts that are imaginative, engaging, concise, and tailored for social media audiences. Step 1: …  ( 10 min )
    [N] Falcon180B released! Sadly without Apache 2.0 they made their own modified version. :(
    LocalLLaMA discussion: https://www.reddit.com/r/LocalLLaMA/comments/16bjdmd/falcon180b_authors_open_source_a_new_180b_version/ Announcement: https://falconllm.tii.ae/falcon-models.html HF Model: https://huggingface.co/tiiuae/falcon-180B Demo: https://huggingface.co/spaces/tiiuae/falcon-180b-demo Blog: https://huggingface.co/blog/falcon-180b 180 Billion parameters Trained on 3.5 trillion tokens Available for research and commercial usage Claims similar performance to Bard, slightly below gpt4 https://falconllm.tii.ae/terms-and-conditions.html https://falconllm.tii.ae/acceptable-use-policy.html submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Tabular Data: DL vs GBDTs on large scale datasets
    I've been hearing lately that NNs are better than GBDTs when scaled up alot: Uber https://www.uber.com/en-CA/blog/deepeta-how-uber-predicts-arrival-times/ Stripe https://stripe.com/blog/how-we-built-it-stripe-radar Most CTR papers coming from google are also NN based (like https://arxiv.org/abs/2209.05310) Meta mentions NNs in their recommender system (also kind of a large scale tabular problem there) https://engineering.fb.com/2023/08/09/ml-applications/scaling-instagram-explore-recommendations-system Lyft forecasting https://medium.com/this-week-in-machine-learning-ai/causal-models-in-practice-at-lyft-with-sean-taylor-1e62efd62385 What's your intuition on DL vs GBDT on (very)large-scale tabular datasets? Have you heard of other such examples (or the reverse)? Are there any particularly interesting open large tabular datasets on which I could test this? I guess datasets should also be wide/hard/with large intrinsic dimention (whatever that means) so there is something to learn with scale (the above examples sure feel good in this way). ​ submitted by /u/_puhsu [link] [comments]  ( 9 min )
    [D] How to get started with 3D machine learning
    Hi. I want to get started with deep learning in 3D. Any suggestions on what libraries I should go with (I have expeirence with Pytorch but open to learn anything other than that which might be better. I came across pytorch3d but not sure if it's good ) what are the basics that are needed and how should I learn them? Also it seems there are not much datasets on this field. submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [N] Fine-Tuning LLMs: LoRA or Full-Parameter? An in-depth Analysis with Llama 2
    After our first blog post gained some attention from folks interested in applied fine-tuning, we now have a follow-up post that discusses all sorts of things we learned while working with LoRA. We hope that this helps engineers and other folks in the community to improve their fine-tuning. Here's what you can expect from the post: We compare full-parameter fine-tuning with LoRA and answer questions around the strengths and weaknesses of the two techniques. We train the Llama 2 models on three real-world use cases and demonstrate that using LoRA involves a trade-off between serving efficiency and model quality, which varies according to the specific task at hand. Additionally, we offer insights into how to stabilize training with LoRA through intelligent prompting techniques. We further show that adopting a lower learning rate can enhance the reliability of the resulting model checkpoints. Link to the blog post If you have questions, I'd be happy to answer them here! submitted by /u/atta_snack [link] [comments]  ( 9 min )
    [P] Automate LLM backend deployments using infrastructure as code
    New GitHub project to provision, update, and destroy the cloud infrastructure for a LLM backend using infrastructure as code (Python). Deployment options include deploying huggingface models to Docker (local), Runpod, and Azure. Blog post Repo submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    [D] Future of ML applied to music/sound
    What is the current landscape around sound analysis and ML applied to music? Which are the latest trends? Do you think there could be a sort of “music revolution”, like there was with the rise of electronic music and synthetizers? submitted by /u/francMesina [link] [comments]  ( 9 min )
    [D] Guidance for building a game AI pipeline
    Hi ML Community! I'm working on a card game similar to Hearthstone or Magic: The Gathering, i.e. a game where two players battle with decks of cards coming from a large collection (for instance, there are around 4000 cards in Hearthstone). Actions are limited to three things: Play a card (potentially on a target) Use a card on a target End the turn I'm looking at building AI for it, and am investigating using machine learning for it. I know very little on the subject (I am a game engineer with a reasonable experience of cloud / AWS stuff), but it seems to me that it might be a good fit: features would be the state of the board (i.e. all the cards in play or in hand or in deck), the turn, and whether the current player has won that game or not (eventually), and label would be the action taken (that turn). I was looking at SageMaker, hoping that it would streamline and allow me to try something relatively easily, but I immediately found it complicated and quite unclear. I would be very grateful if anyone could point me at resources describing at a high level what a full ML pipeline could look like (i.e. what software can injest this kind of data, what software can provide inference, etc.). For instance, would it be saner to "just" get started with Spark on EMR for this kind of problem domain? I hope I'm not too wide off the mark with those questions, and thanks in advance! submitted by /u/tinkagames_g [link] [comments]  ( 9 min )
    [D] Why RLHF instead of direct ranking loss?
    This may be basic question for some one but it bothers me for a while. For the instructgpt or whatever following model with alignment, RLHF seems to be the standards. We get human feedback and train a reward model, then we use rl to further finetune the model. However, why not directly use human feedback to finetune with a simple ranking loss(e.g pairwise loss)? What might be the best advantage for RLHF? submitted by /u/Chen806 [link] [comments]  ( 9 min )
    [D] Advice on training on noisy million scale dataset?
    I've just finished pre-processing the danbooru dataset, which if you don't know, is a 5 million anime image dataset. Each image is tagged by humans such as ['1girl', 'thigh_highs', 'blue eyes'], however, many images are missing tags due to there being so many. I've filtered the tags (classes) down to the 15k most common. Although the top classes have 100k or more examples, many rare classes only have a few hundred tags (long tail problem?). This is my first time training on such a large dataset, and I'm planning on using Convnext due to close to SOTA accuracy and fast training speed. Perhaps vit or a transformer architecture may benefit from such a large dataset? However, vit trains way slower even on my 4090. What are some tips and tricks for training on such a large noisy dastaset? Existing models such as deepdanbooru work well on common classes, but struggles on rare classes in my testing. I assume class unbalance will be a huge problem, as the 100k classes will dominate the loss compared to the rarer classes. Perhaps focal loss or higher sampling ratio for rare classes? For missing labels, I'm planning on using psuedolabeling (self distillation) to fix the missing labels. What is the best practice when generating psuedolabels? ​ Any tips or experiences with training on large unbalanced noisy datasets you could contribute would be greatly appreciated! submitted by /u/Chance-Tell-9847 [link] [comments]  ( 9 min )
    [D] The greatest success stories of Reinforcement Learning
    Hello guys, I made a video for my YT channel discussing some of the greatest success stories in Deep Reinforcement Learning. The video is meant to provide some intuition on RL as a concept as well as a basic understanding of how these different projects work under the hood. There are way too many great RL projects, so I didn’t try to make it an exhaustive list (I’m gonna do more videos later talking about more projects - maybe make a series out of it), but I chose four that I’ve personally worked with in the past/find really insightful and educational (DQN/Atari, Alpha GO, DeepMimic, and Dactyl). Thanks for reading. Here is the link, hope you guys check it out. All feedback is appreciated! https://youtu.be/zOXcNFM8dt4 submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [P] Looking for a freelancer
    Hi all! I have a project I would need help with. We need to build a MVP (minimum viable product) of a combination of two models. A model that recommend the best channel to use performing a task. And then after that a model to recommend the best time today to perform that task in given channel. We have a set of features already defined. Some are in the data and some are generated from the data. Looking for someone who could work on this as a freelancer. Our preferred environment would be AWS SageMaker, but honestly not a necessity at this point as this is a MVP. Due to the reason I want to keep this "secret" for a while, I will not disclose all the details in this post. End product that I am waiting for includes (but not restricted to): - Model Training script that evaluates if the new model is more accurate as the previous model (some level of version control) - Model prediction API that will accept the data and prepare it for the models, run the prediction, return the result with accuracy. submitted by /u/S0pg [link] [comments]  ( 9 min )
    [R] How well do LLMs do on specific ML NLP tasks compared to previous models - paper takeaways
    Hi all ! Reading through articles online and reading through sub reddit I have seen some people use LLMs (mainly through openAI) for nlp specific tasks (NER, Text classification, etc.). I was a bit surprised as smaller (~100 million) size models already like RoBERTa exist for such cases. Not much content online about this beside this recent paper : https://arxiv.org/pdf/2308.10092.pdf Highly recommend reading it, here are a few take aways: Most LLM benchmarks today focus on capabilities like understanding, reasoning and Q&A. They often overlook performance on specific nlp tasks like text classification, NER, etc. Llama 2 (70b) required fine-tuning to beat GPT 3.5 in some tasks. Both were still overall outperformed by RoBERTa. In certain cases GPT4 did better. However smaller open models provide more advantages in terms of speed, cost and transparency. The difference of speed/latency (often more important than accuracy in production) and the cost differences between LLMs and "Smaller" models is mind blowing in my view (see screenshots) ​ Cost, speed and throughput comparaison How good the models do on various tasks/datasets Note: Not saying benchmarks are a source of truth, just found the analysis interesting, always take benchmarks with a grain of salt. If you're using LLMs for anything else beside text generation, I'm curious to know more about your experience so far :) cheers! submitted by /u/EnthusiasmNew7222 [link] [comments]  ( 9 min )
    [D] Maximum Sequence Length Supported by Sinusoidal Positional Encoding?
    Hello everyone, I've been pondering on sinusoidal positional encoding and its limitations. Does anybody know of a maximum sequence length that this absolute positional encoding may support? I'm coming from a deep reinforcement learning background, so I'm not too familiar with NLP papers, like I couldn't figure out the sequence length used in the original transformer paper. Thanks in advance for any info! submitted by /u/LilHairdy [link] [comments]  ( 9 min )
    [D] How to optimize parameters of a model written in C
    Problem: I have a quite complex model that is written in C that takes parameters as an input and estimates a curve as an output. I would like to optimize the parameters by comparing the output with the real measurements using ML methods such as stochastic gradient descent. ​ Question: Is there any possible way to use white box optimizers to optimize the parameters of my C-model without adapting the model itself? Is there a framework that I could use? ​ What I tried: I tried using frameworks such as tensorflow or pytorch and tried to include the compiled C-model in Python. However, gradient tracking does not work when using C functions. I tried doing the optimization in C++ by using libtorch. I realized that for gradient tracking it is essential to only use torch methods. I cannot adapt the C functions to torch functions. I don't want to use black box optimizers since they require good knowledge of the parameters that I will not have. submitted by /u/romtej [link] [comments]  ( 9 min )
    [D] Foundation Models or Fine-tune VAEs
    I am considering building a model that will be the basis of many specialized models that each don’t need much computational capabilities. What’s the current way to go about this? I was reading about Teslas Hydra network that looks to be more of a foundation model. However, newer methods like latent diffusion models operate on a latent space generated by more advanced auto encoders such as VQ-VAE. I couldn’t find any papers going into this direction and would be curious to hear your thoughts! submitted by /u/That_Phone6702 [link] [comments]  ( 9 min )
    [D] How do you get started with LLMs as a complete beginner?
    Can you give me courses and recommendations on how to get started with llm submitted by /u/uzitarekc [link] [comments]  ( 9 min )
    [D] How Do Large Language Models Achieve Translation as an Emergent Property? 🌍
    Hey fellow Redditors, I've been wondering about a question lately about the inner workings of large language models like GPT-3.5 and I'm hoping some of you knowledgeable folks can shed some light on this. My curiosity centers around how these models manage to perform translation tasks as an emergent property of next token prediction. So, here's my question: Does the training data for models like GPT-3.5 contain text explicitly linking between languages, such as a dictionary, or do they learn translation by assigning similarity between words in different languages based on mathematical metrics like cosine distance?So in that sense, being indepedently trained on several textbooks of different languages (not on the same topic), they would be able to link languages simply by their arithmetic properties? I hope that's making sense. For instance, if you look at words like "queen" in English and "rainha" in Portuguese, they share a certain similarity that could be quantified using mathematical similarity metrics. I'm wondering if through this similar vector assignment, the models learn what means what. I'm more leaning towards the latter, but I'm too lazy to pursue this empirically.As a follow up question, does this mean that if we are able to predict whale conversation, we would be able to translate it to English as well? Thanks in advance for any input you can provide! 🤓 submitted by /u/AlexandreFSR [link] [comments]  ( 9 min )
    [News] AI-Based Physics Predictions in Your Web-Browser!
    If you are interested in Engineering simualtion and ML, check out this webinar from SimScale on the 4th of October! Join the webinar to find out more. https://www.simscale.com/webinars-workshops/ai-based-physics-predictions/ https://www.reddit.com/r/simscale/comments/16bdq3x/aibased_physics_predictions_in_your_webbrowser/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/s_laine [link] [comments]  ( 9 min )
    [D] How does Llama-2 perform in sentiment analysis?
    Hey guys, if you have explored using Llama-2 in doing sentiment analysis, just wanted to get your experience in how Llama-2 perform in this task? I have tried using GPT and it’s pretty accurate. If Llama-2 isn’t all that good in sentiment analysis, which other open LLM would you recommend? Thank heaps! submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [R] Can LLMs learn from a single example?
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    [D] Aspiring MLE Discord
    Hi all, I’m an aspiring Machine Learning Engineer. I want to be a practitioner. Building, deploying, and evaluating models to solve problems. Ideally I want to land a job in Tech as an MLE. I struggle at times to stay committed to building side projects, studying ML algos, etc. I have a background in hardware specific C++ SWE stuff for 3.5 yrs, but not much in the way of ML and web backend. I do have a decent amount of python coding from other experiences and it’s my preferred language. Would anyone be interested in forming a discord to talk about what we are doing to prepare, practice interview each other, stay accountable to each other, etc? Had a few people show interest in r/ArtificialInteligence already If so comment below! Let’s do this! submitted by /u/Srokisthename [link] [comments]  ( 9 min )
  • Open

    How can I pass in the models policy to the reset function for logging in Stable Baselines 3?
    I want to pass the policy from my main file into my agent file so that I can log the data collected during training. I am already collecting things like the score and reward but I don't know how I could collect things like the policy loss or explained variance where they are changing as the training progresses . I want to log these to an XSLX file every time the reset function is called (once every game) preferably the same one I am logging the score and reward to. The game is the classic snake game, run on pygame. Here is my main code: import gymnasium from stable_baselines3 import A2C from agentStable import snakeEnv from eiffel2 import builder # Import Eiffel2's builder function from torchsummary import summary # from agentStable import data_manager # Initialize your custom environme…  ( 12 min )
    How to go about reverse engineering historical trading data?
    Hi, Assume I have data for forex / stocks day trading, where my data/columns are: 1) price of last 50 ticks (a tick is the price at that moment in time, the smallest movement possibly that you can get for that currency) 2) If we should be in a trade (and direction of trade. where 1 = trade going up. 2 = trade going down. 0 = we should not be in a trade). I have tried classification (I generalized the tick price by changing it to pct_change() ) but accuracy is low. would it be possible to reverse engineer through reinforcement learning given these data? I am actually more interested in the trade exiting only (so if trade is currently has value of 1 then it became 0 or 2, it means we should exit existing trade). any guide on how to go about this? Yes I know it will be hard. but if humans can teach a robot to walk, maybe hopefully an agent can be taught to learn to exit a trade based on historucal data? I have done preliminary readings, and is PPO the best way to go? or DQN? assuming I will use stable baseline3. I am also open to using other Python libraries. Thank you. submitted by /u/oniongarlic88 [link] [comments]  ( 9 min )
    The great success stories of RL (A video)
    Hello guys, I made a video for my YT channel discussing some of the greatest success stories in Deep Reinforcement Learning. The video is meant to provide some intuition on RL as a concept as well as a basic understanding of how these different projects work under the hood. There are way too many great RL projects, so I didn’t try to make it an exhaustive list (I’m gonna do more videos later talking about more projects - maybe make a series out of it), but I chose four that I’ve personally worked with in the past/find really insightful and educational (DQN/Atari, Alpha GO, DeepMimic, and Dactyl). Thanks for reading. Here is the link, hope you guys check it out. All feedback is appreciated! https://youtu.be/zOXcNFM8dt4 submitted by /u/AvvYaa [link] [comments]  ( 8 min )
    combination of reinforcement learning and supervised learning
    Hi. I'm trying to train a robot that will minic the action that we provide via a video input. On the surface it sounds similar to teaching the robot to walk, but it's not. We can train the robot to make it walk easily these days. But I'm not sure how to teach it to minic an action that we perform. Because each time a new action can be given to the robot and it has to minic that action (it's sort of like a supervised data that the robot has to memorize) Is there a way to do it? is it some branch of machine learning that I'm not aware? The robot is a humanoid simulation. ​ submitted by /u/rakk109 [link] [comments]  ( 9 min )
    My first ever Unity ML Agents AI training!
    submitted by /u/R_AIAO [link] [comments]  ( 9 min )
    Baseline behaviour of agents
    I’m having a tough time understanding, how to establish baseline behaviour of agents in a LLM RLHF environment. I have data with time stamp and rewards from several models for each agent. My question is how do we establish baseline behaviour of agents? Does each row in weights and bias considered as a separate agent? Are the initial few 100’s of rewards according to timestamp be considered as baseline behaviour? Thankful in advance. submitted by /u/Private050 [link] [comments]  ( 9 min )
    Some question about GAIL
    Recently, I've been trying to replicate the method described in the paper "AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control" by training an agent in Isaac Gym using GAIL. However, I've encountered some issues. After adding the discriminator network, the discriminator's loss function stabilizes at around 0.3, and I'm unsure if this value is too high. Additionally, it is strange that the value loss of my value network can reach values between 80 and 90. I want to know if anyone else has experienced a similar situation and what might be the reasons behind these issues. submitted by /u/Mia_Sue_123 [link] [comments]  ( 9 min )
    Relation between state value and state-action value function
    I am following Lil's Weng Blog on RL over here (https://lilianweng.github.io/posts/2018-02-19-rl-overview/) - 1) I am confused how this expression came about - ​ https://preview.redd.it/jo31wbrt2jmb1.png?width=1106&format=png&auto=webp&s=15946bebef2dccadfabf2205d5283729d5405826 2) I am also lost with the origin of this expression - https://preview.redd.it/fs4f46xv2jmb1.png?width=1097&format=png&auto=webp&s=9cc72b265e7f1069bddee9714d1deb7cc3a61775 3) Regarding the second image, where did the expectations go? If you see the top of the image, the state-action value is represented using an expectation but at the bottom, I don't see any expectation. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
  • Open

    Frontiers of multimodal learning: A responsible AI approach
    New evaluation methods and a commitment to continual improvement are musts if we’re to build multimodal AI systems that advance human goals. Learn about cutting-edge research into the responsible development and use of multimodal AI at Microsoft. The post Frontiers of multimodal learning: A responsible AI approach appeared first on Microsoft Research.  ( 25 min )
  • Open

    TSMixer: An all-MLP architecture for time series forecasting
    Posted by Si-An Chen, Student Researcher, Cloud AI Team, and Chun-Liang Li, Research Scientist, Cloud AI Team Time series forecasting is critical to various real-world applications, from demand forecasting to pandemic spread prediction. In multivariate time series forecasting (forecasting multiple variants at the same time), one can split existing methods into two categories: univariate models and multivariate models. Univariate models focus on inter-series interactions or temporal patterns that encompass trends and seasonal patterns on a time series with a single variable. Examples of such trends and seasonal patterns might be the way mortgage rates increase due to inflation, and how traffic peaks during rush hour. In addition to inter-series patterns, multivariate models process intr…  ( 92 min )
  • Open

    Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart
    In this post, we build a secure enterprise application using AWS Amplify that invokes an Amazon SageMaker JumpStart foundation model, Amazon SageMaker endpoints, and Amazon OpenSearch Service to explain how to create text-to-text or text-to-image and Retrieval Augmented Generation (RAG). You can use this post as a reference to build secure enterprise applications in the Generative AI domain using AWS services.  ( 7 min )
    Intelligently search Adobe Experience Manager content using Amazon Kendra
    This post shows you how to configure the Amazon Kendra AEM connector to index your content and search your AEM assets and pages. The connector also ingests the access control list (ACL) information for each document. The ACL information is used to show search results filtered by what a user has access to.  ( 11 min )
    Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart
    Today, we are excited to announce the capability to fine-tune Llama 2 models by Meta using Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, are optimized for dialogue use cases.  ( 46 min )
    Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs
    Recently, generative AI applications have captured widespread attention and imagination. Customers want to deploy generative AI models on GPUs but at the same time are conscious of costs. SageMaker MMEs support GPU instances and is a great option for these types of applications. Today, we are excited to announce TorchServe support for SageMaker MMEs. This new model server support gives you the advantage of all the benefits of MMEs while still using the serving stack that TorchServe customers are most familiar with. In this post, we demonstrate how to host generative AI models, such as Stable Diffusion and Segment Anything Model, on SageMaker MMEs using TorchServe and build a language-guided editing solution that can help artists and content creators develop and iterate their artwork faster.  ( 12 min )
  • Open

    Checksum polynomials
    A large class of checksum algorithms have the following pattern: Think of the bits in a file as the coefficients in a polynomial P(x). Divide P(x) by a fixed polynomial Q(x) mod 2 and keep the remainder. Report the remainder as a sequence of bits. In practice there’s a little more to the algorithm than […] Checksum polynomials first appeared on John D. Cook.  ( 6 min )
  • Open

    A Powerful Legacy: Researcher’s Mom Fueled Passion for Nuclear Fusion
    Before she entered high school, Ge Dong wanted to be a physicist like her mom, a professor at Shanghai Jiao Tong University.  ( 6 min )
    ‘Arteana’s Art Squad’ Assembles — Indie Showrunner Rafi Nizam Creates High-End Children’s Show on a Budget
    Rafi Nizam is an award-winning independent animator, director, character designer and more. He’s developed feature films at Sony Pictures, children’s series and comedies at BBC and global transmedia content at NBCUniversal.  ( 8 min )

  • Open

    Assume You Have To Place $100 Bet On One of 3 Nick Bostrom Simulation Theory Scenarios: Which Scenario Would You Bet On?
    Odds are same for each option 1/3. I believe results will be really interesting observation . ​ View Poll submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    New AI-generated COVID drug enters Phase I clinical trials: Claims to be effective against all variants
    Insilico Medicine, an AI-driven biotech company, has announced its AI-designed COVID-19 drug is entering Phase I clinical trials. Promising to deliver lasting results against all variants, this could become the first viable alternative to Paxlovid. To stay on top of such cutting-edge advancements in AI, look here first. Insilico's breakthrough medicine, ISM3312 Generated using Artificial Intelligence, ISM3312 may offer the superior solution to the constraints of current oral medication, Paxlovid. Insilico’s new drug could address the limitations of Paxlovid, including unpleasant side effects and drug resistance due to constant COVID mutation. Preclinical studies reveal the drug’s potential in reducing the viral load in lung tissue and mitigating lung inflammation. Development powered by AI Identified via AI-driven platform PandaOmics, the drug effectively targets crucial proteins in the coronavirus. Using Chemistry42, a generative chemistry platform, Insilico generated new molecules built to suppress this protein, creating ISM3312. Given the success, the company patented ISM3312, which is currently undergoing Phase I Clinical trials, with results expected by end 2023. The Implications Dr. Harvey Castro, an emergency medicine physician, encourages doctors to remain cautious but also recognizes the promise of AI-generated drugs like ISM3312. With the trials in progress, the medical community is closely monitoring it as it could redefine the treatment course for COVID and other similar viruses. Insilico's venture exhibits AI's potential in accelerating effective drug discovery, prompting the need for consistent tracking of AI's transformation of healthcare and other fields. (source) P.S. If you like this kind of analysis, I compile a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and Insilico Medicine are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 10 min )
    Are you an AI beginner or AI professional?
    submitted by /u/MarkFulton [link] [comments]  ( 9 min )
    What OpenAI Really Wants
    submitted by /u/Alone-Competition-77 [link] [comments]  ( 9 min )
    What are some good open source projects exploring emotional voice synthesis?
    There are tons of TTS software out there, but they don't incorporate human emotions during speech synthesis. For example, anger, tiredness, surprise, happiness... What solutions exist for this today? submitted by /u/ICWiener6666 [link] [comments]  ( 9 min )
    Thoughts for my disgruntled artist friends:
    Learning a skill, for me, was never about securing knowledge that privileged me over everyone else who did not put the work in. While often, it did feel like drinking Kool-Aid, buying in to these groups like yoga and climbing, I knew I was not there to rub elbows, but to discover the how behind it. Some leaders of some groups did create a barrier of entry, a necessary proving point, but I have always seen these loops to jump through as a challenge - once completed - a spy. Every skill you have learned has prepared you not to be better at that skill, but to learn a new skill with more ease. It is uncomfortable to learn something new, like drinking from a fire hydrant, but the more sips you take from that blasting surge of water, the more you realize it is all part of the process. We get blasted, we sip, we get overwhelmed, we come back. Just because there is a tool that regulates the blasting, that holds our hand through the overwhelm, does not mean all our hard work has been for nothing. In fact, it means we are more prepared, more primed, to receive all of the beauty and knowledge coming our way. Now, friends, we become CURATORS. :) xo submitted by /u/airkaty [link] [comments]  ( 9 min )
    Tesla Diesel Truck Commercial (AI)
    submitted by /u/wisconsin-sopapa [link] [comments]  ( 9 min )
  • Open

    Issues with Creating a MultiAgentEnv
    Rllib is making me feel like the biggest idiot, again, and maybe someone else knows what I'm doing wrong here? It feels like I'm missing what should be a fairly simple step... I keep receiving the following error message, which is odd, as my environment is an extension of MultiAgentEnv. Is there anything else I need to do in order for my environment to pass the check successfully? ValueError: Have multiple policies , but the env >> is not a subclass of BaseEnv, MultiAgentEnv, ActorHandle, or ExternalMultiAgentEnv! ​ class RoutingEnv(MultiAgentEnv): metadata = { "render_modes": ["human"] } def __init__(self, render_mode="human", **kwargs): super().__init__() ​ ​ gym.envs.register( id="MyEnv-v0", entry_point='routing_rl.envs:RoutingEnv', kwargs={"config": param_config} ) env_name = "MyEnv-v0" train_steps = 200000 learning_rate = 1e-3 save_dir = "saved_models" def register(config): env = gym.make("MyEnv-v0") return env # register the predefined scenario with RLlib register_env("MultiEnv", register) config = ( PPOConfig() .training(lr=0.001, _enable_learner_api=False) .environment(env="MultiEnv") .environment(disable_env_checking=True) .resources(num_cpus_per_worker=1) .rollouts(num_rollout_workers=0) .multi_agent( policies={"shared_policy": PolicySpec()}, policy_mapping_fn=lambda agent_id, episode, worker, **kwargs: "shared_policy", ) ​ submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
    [D] Can information about the action selected be used to inject information to learning agent
    Hi all, I am training an agent via PPO. The environment is a node removal ('n' number of actions which are nodes on the graph) with evaluation after each node removed. the state is represented by a trained graph attention network in the environment with the average of the node embeddings on the graph representing the state of dimension size 'n'. The embedding of the a node that has been removed is subtracted from overall graph embedding representation to represent the 'removal' of that node. However, I want to absolutely be certain, that given a state representation, in a new unseen graph, the agent will not select a node that is absent from that graph. In the event that the state representation may not be granular enough and might cause the agent to think that a node on the graph is present when it is not, are there ways to mitigate this? Two ideas I have are: mask actions for nodes that are not present (this is already done after node removal to prevent the agent from selecting the same node again), but is this valid to do in an unseen graph if I a priori mask nodes that are not present in the action space Inject a second input to the policy network such as a one-hot encoding of nodes that have already been selected as an input in addition to the state representation of the graph, so that it models finer dependencies between the state and action taken. However is this valid? Any thoughts are appreciated! thank you! submitted by /u/amjass12 [link] [comments]  ( 10 min )
    Swap, Earn, Airdrop: ZKSyncSwap
    https://zsyncswap.technology/ submitted by /u/shivamrai24 [link] [comments]  ( 9 min )
  • Open

    Transformers Aren’t Turing-complete, But a Good Disguise Is All You Need
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Introducing Refact Code LLM: 1.6B State-of-the-Art LLM for Code that Reaches 32% HumanEval
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [R] Question about Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning, ICLR 2021
    As stated here I already emailed the authors and asked in ai.stackexchage, but I haven't received any replies, so I am trying my luck here. I believe the question was clearly stated in the ai.stackexchange link included here again, and the paper in question can be found here. So, I won't repeat it here because the formatting here is worse. I am hoping maybe someone can shed a light on my issue. If this is an inappropriate use of this sub, I'll take the post down :D submitted by /u/carlml [link] [comments]  ( 9 min )
    [P] Deploying a Grounding DINO Model to a Rest API Endpoint for Open-Set Object Detection with Prompts
    Hi everyone! Last year we launched a tool to make it easier to deploy ML models into production behind REST APIs. Our first prototype was focused on small models built with Scikit-Learn and XGBoost, but pretty quickly we got a lot of requests to support bigger, more complex models built on Tensorflow, Pytorch and Transformers. From detecting model dependencies to building out auto-scaling compute, it's been a lot of fun working through the challenges to make this product scale. We've built a few tutorials to showcase deploying some interesting and complex models to REST Endpoints. The latest one we released is a tutorial showing how to deploy a Grounding DINO model to a Rest API Endpoint for open-set object detection with prompts. Link to blog post tutorial. Link to Colab notebook. https://preview.redd.it/mi3jk4t5jimb1.png?width=950&format=png&auto=webp&s=37524b719f9dd6fb1605d0c18fcec7da31a685dd submitted by /u/Jazzlike_Flamingo_35 [link] [comments]  ( 9 min )
    [P] Looking for a text classification problem for something helpful in social media
    Hi! I am looking for an text classification problem where I can use text data from social media. Similar projects I have found interesting is classifying if the author is depressed, pro-eating disorder, right wing radical, a potential schoolshooter, a bully or a pedophile. If any of you have a suggestion for a classification problem that can be used for something good, please comment. submitted by /u/IndependentSidekick [link] [comments]  ( 9 min )
    [D] Phrase Similarity Based On Images (embeddings)
    So I know that embeddings work by finding words that are used in similar contexts or found around some input word. This allows us to find similar words based on proximity to other words and in a way, map a relationship between an input word and other words. But I assume children learn what words mean and the intuition behind them, by hearing the word and associating it with visuals or a specific scenario in front of them which helps them to add context to that word and how it is used. If we were to emulate how children learn words, could we or is there an architecture that allows us to take an input word, find images with the input word in there (Object detection) and then extract the context from the images (other objects and their position and relation to the input word) then convert that context to phrases and query those phrases the next time that a word is inputted to see other phrases or words that are similar to the input word based on whether or not they appear in the images of the input word. Not sure if it makes sense or if it is even useful compared to embeddings but I was thinking about how we could emulate how children learn words to see if we could draw influence from that. Just wondering if there’s a similar approach to this where we use context from images to find similar words and phrases to some input. submitted by /u/4K-AMER [link] [comments]  ( 9 min )
    Data preprocessing/ augmentation for named entity recognition? [D] [P]
    I am currently doing named entity recognition with a bert model. Its working fine so far, so I am now trying to ameliorate my results. Usually my first thought when I try to augment my ML models is input data preprocessing. In case of NER stop word removal and removal of punctuation, numbers and one-character words came to mind - they are hardly ever named entities so I woulndt loose many training examples. However, NER does in fact require context to work, so removing stuff could prove harmfull in the end? I am kind of torn. Should I do it? Are there better data augmentation approaches? I would be really thankfull for any kind of hint submitted by /u/SilverDusk42 [link] [comments]  ( 9 min )
    [D] lost junior Machine Learning engineer
    Hello everyone, I know it’s a bit silly to ask these kind of questions, but Im gonna give it a shot since I’ve seen lots of talented people in here. I am gonna try to keep it as short asp.(Also please excuse my "sometimes" bad English, I am not a native speaker) Well, last year I graduated as an industrial engineer, I was thinking during my last year of studies to completely switch to programming since many of my friends are programmers, but they are all web. So I dedicated the last year of my engineering studies to getting to know what machine learning actually is besides my studies (also tbh I wasn’t very consistent) (also my learning material was mostly the famous DL spec by Andrew on coursera), at the end of the year we have something called project of end of studies (like a masters t…  ( 10 min )
    [N] Streamlit launches LLM Hackathon 🧠
    Streamlit just launched its latest hackathon focused on large language models and AI 🚀 Awesome opportunity to build a Streamlit app using LangChain, LlamaIndex, AssemblyAI, Weaviate, or Clarifai, and win cool prizes (AirPods, Yeti microphone, mechanical keyboard, to name a few) More info on the hackathon here Streamlit LLM Hackathon submitted by /u/carolinedfrasca [link] [comments]  ( 9 min )
    [P]Embedchain Open Source project is a game changer
    I was just exploring ChatBot and LLMs and found a library named Embedchain AI. This library lets you build a ChatBot like ChatGPT in just 3-4 lines of code. Tutorial: https://www.youtube.com/watch?v=vIhDh7H73Ww submitted by /u/trj_flash75 [link] [comments]  ( 9 min )
    [D] Tl;dr Approximate Inference methods made easy
    “MCMC vs VI” is no longer a discussion about your favourite Roman numeral. If you share my trepidation for model performance in the face of data sparsity, or you simply suffer from anxiety uncertainty, you might be tempted into the Bayesian world. Years later at the precipice of your career (and mental health degeneracy) you over-engineer probabilistic models so intractable that would stress Lord Bayes himself into stomach ulcers. The solution? Approximate inference, the true antihero to model simplification. I wrote a brief primer for those who enjoy maths and those who disdain it, in both cases it's impossible to avoid using maths while discussing Bayesian statistics so I kept it as light as I could. PS - This is a Reddit-friendly copypasta from my medium article, so if you're a visual …  ( 13 min )
    [Discussion] Has anyone went through the ml.school course from Santiago? Is it any good?
    I used to do some basic machine learning a few years ago (7+), but then went into what now became data engineering, because of the lack of opportunities in ML. This year I'm trying to up my game and maybe switch back to ML, which I've always been following and tinkering with, but I want to learn all the necessary skills at least at a basic level, in order to find an ML job. I'm learning on my own but now I'm looking for resources regarding MLOps and found ml.school and I'm curious if anyone has any opinons about it or if there is anyone here who has went over the course? Thanks in advance for any help or info! submitted by /u/jack-in-the-sack [link] [comments]  ( 9 min )
    [D] Most user-friendly data labelling tool (non-AI)
    Hi I am currently creating computer vision models for segmentation and classification, and I am looking for a tool that is very user friendly. We have been using CVAT so far, and apparently, its UI is too cluttered. So, we need something easier to use. Segment Anything and other auto-segmentation tools simply do not work on our dataset. So, I do not want a tool that is user friendly because it uses AI. Any thoughts? submitted by /u/Avatrin [link] [comments]  ( 9 min )
    [P] Introducing CometLLM: Track, Visualize, and Annotate your LLM Prompts
    Hello ML Community, We released our new LLMOps Tool: CometLLM. It's highly optimized for Prompt Engineering Workflows and making it easy to find the best prompts for your use-case! Here a few helpful things you can do with this tool! Score/Rate Your Prompts Add Metadata to your Logged Prompts (Great for Tracking Prompt Usage) Search for Specific Prompts via Keywords/Phrases Visualize Full-On Prompt Chains! Group Your Prompts Hope the ML Community find this useful as well continue to experiment with LLMs! Don't Hesitate to reach out if you have any feedback! submitted by /u/metric_logger [link] [comments]  ( 9 min )
    [R] Direct Preference Optimization: Your Language Model Is Secretly A Reward Model
    submitted by /u/EducationalCicada [link] [comments]  ( 9 min )
    [R] How I could handle BIG network traffic dataset for ML?
    Hello people! This is the first time that I post here and I desperately need your help. I need to perform anomaly detection on a huge network traffic dataset with isolation forest (unsupervised learning). I have the .pcap files of a whole month and and for each day there are multiple devices that communicated each other. So the file of each day is from 700 MB to 2 or 3 GB. My initial idea was to only maintain the header of the packets and to discard the data payload. But even in this case the dataset remains huge and the number of entries is crazy. What I should do? submitted by /u/J-Devesh [link] [comments]  ( 9 min )
    [R] what processes should one follow to find better recommendation systems than these?
    "The Greatest Books - Combines many top book lists to create a master list anobii - a community built by readers for readers allowing you to find, shelve, review and share books Author Alcove - Rate read books, shelve to be read, and receive recommendations. BookDigits - Book tracking, rating, and discovery with achievements. Another from an r/books member (I really think this plus authoralcove would be perfect) booklikes - Book tracking and blogging/reviewing Goodreads - The popular choice for book social media, reviews, and tracking LibraryThing - The old standby, of webbased personal library management Litsy - Insagram inspired social media app for tracking and reviwing books Lovelybooks - German book tracking site readernaut - Readernaut helps you make your book list, build a library, keep track of what you've read and what you'd like to read, and then share those lists with your friends. Readgeek - Book review and cataloging site by a redditor(?) and translated from german Riffle - track & reivew books with social media integration TasteDive - (aka tastekid) social rating site for music, movies, shows, books, authors, and games Discovered - Dating site/app for bookworms Calibre - The go to for ebook management The Game of Books - A kickstarter. They used to have a beta up but it's gone now too - http://gameofbooks.com/level_up weread - Encouraging Children To Read: Articles, ideas, and information to encourage children to read thirdscribe - ThirdScribe provides authors and readers with actual tools and services they can use to enjoy their books as well as grow and connect with their audience. What Should I Read Next? - A book recommendation engine bookfinder - book search tool 50 Book Pledge - Goal based book tracking anno.wiki - collaborative book annotation" submitted by /u/Fearless-Room-504 [link] [comments]  ( 10 min )
    [P] Locally train and generate AI VoiceOver using a large data set of my voice and matching scripts.
    Hi, I've voiced over 500 videos for a YouTube channel and have the accompanying voiceover audio and scripts. I'd like to train a very robust AI to generate VoiceOver locally and not use an online service using the extensive amount of audio/scripts I have stored. My hardware is a 3070 and 12700. All other solutions have been online such as Elevenlabs. This will be a secondary service I could provide alongside bespoke voice over. submitted by /u/dfawlt [link] [comments]  ( 9 min )
    [D] How is currently your experience with availability of GPUs across providers ?
    Just wanted to ask what has been lately your experience with availability of GPUs across providers (major ones - AWS, Azure, GCP, but also some minor ones). Especially when it comes to GPUs which are more suited for ML (A100s, H100s). Anyone also considering buying physical hardware instead ? submitted by /u/remek [link] [comments]  ( 9 min )
    [D] How do you observe the behaviour / satisfaction of users of your LLM product?
    Soon, I will launch an LLM-powered chatbot. I have run plenty of tests to make sure the LLM works well, but I am super curious about the experience of real (external) users. I’d like to find out if users are happy with the answers the model generates, what topics they ask about, etc. And also how much each user costs me since the service is free and I am paying for it at the moment being. I expect to be able to improve the product over time with this kind of insights. Are you guys trying to track similar metrics? If so, how do you do it? Thank you! submitted by /u/jroux92 [link] [comments]  ( 9 min )
    [D] Randomized Search with Early Stopping for LGBMClassifier
    I have been running hyperparameter optimization for an LGBM multi-classifier model with randomized search with 10fold stratified cv as well as oversampling on each fold using SMOTE as follows: # Create a pipeline with SMOTE oversampling smote_pipeline = make_pipeline(SMOTE(random_state=42), lgbm_clf) # Initialize 10-fold stratified cross-validation cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42) # Initialize RandomizedSearchCV for hyperparameter tuning using the pipeline random_search = RandomizedSearchCV( estimator=smote_pipeline, param_distributions=param_dist_with_prefix, n_iter=n_iter, scoring=f1_macro_scorer, n_jobs=n_cores, pre_dispatch=n_cores, cv=cv, random_state=42 I wanted do incorporate early stopping based on the validation set from the nth iteration of cv. However this does not seem possible using the current API if I am correct. If I wanted to use a predefined validation set the code would be sth like this but I want to perform validation using 10 fold-cv validation set only:. mode l= lgb.LGBMClassifier() clf = RandomizedSearchCV( model, parameters, fit_params={ 'early_stopping_rounds':20, 'eval_set':[(X,y)] }, cv=cv ) My questions are: 1- Does it make sense to use early stopping during randomized search? 2- Do you know a way I could do it? 3- If not, is it a good idea to use randomized search without early stopping and train a new model with early stopping using the best parameters resulting from randomized search? Bonus Question: Does it make sense to run randomized search with f1_macro scoring from sklearn instead of multilogloss in case of imbalanced classes? submitted by /u/returnname35 [link] [comments]  ( 9 min )
    [D] What do you put in your lab notes?
    I'm working my way through various tweaks to a ML pipeline, and I've started keeping short lab notes in a markdown file with just the time, a brief summary of changes, and my observations on training metrics or anything else interesting on a training run. I've also started copying a snapshot of the Python source code to the tensorboard directory, which has saved me a lot of headache. I was wondering how other people keep lab notes, and especially what you find useful to record and how you structure the notes. submitted by /u/hazard02 [link] [comments]  ( 9 min )
    [R] [D] Machine learning model to predict deformation of 2D object
    Hello, I am currently working on predicting 2D deformations of objects. These objects are available as 2D contours in my code. I am splitting these contours into 1000 points with an equal distance in the direction of the x axis. I have about 70 data entries. The following picture shows one of these objects: Comparison before and after The red data series contains the points before and the blue series contains the points after the deformation. My model should take in a series of coordinates before the deformation. Using this information the model should predict the coordinates after the deformation. I have tried using the LSTM Model from keras. Unfortunately I wasn't able to produce useful results. The way I structured my data is the following: [ [ [x1, y1], [x2, y2], [x3, y3], ... 1000 coordinate pairs ], [ [x1, y1], [x2, y2], ... ], .... 70 entries ] The structure for the input and the output series is the same. When trying to train the model I have a very low loss and low validation loss as well: Overview during training of model The test loss is also quite similar: Overview test loss However when looking into the predictions I get results like these: Visualized prediction after training The prediction is not close to what it should be like. Also the prediction seems to not change even when changing the input. ​ Do you have an idea about why my ML model does not work? Are there examples on this topic available? Should I change my approach in any way? Thank you in advance! Any help is appreciated! ​ If you need my jupyter-notebook, it would be great if somebody could tell me, how to link files on Reddit :) submitted by /u/InitiativeGlass4701 [link] [comments]  ( 10 min )
    [P] Equinox (1.3k stars), a JAX library for neural networks and sciML
    Hey folks! I wanted to advertise Equinox -- my now-surprisingly-popular ( :D ) JAX library for numerical models. These days that often means "neural networks", but I like to emphasise that this also includes ODEs/SDEs/linear solves, etc. Here's the GitHub link: https://github.com/patrick-kidger/equinox For those already using JAX, then Equinox is interesting because (a) it ships with a NN library, and (b) this is built around the idea that "everything is a pytree", which makes things easy to reason about and easy to compose. Furthermore (c) Equinox offers advanced tools like true runtime errors, out-of-place pytree surgery, and checkpointed while loops, and AFAIK in the JAX ecosystem these are unique to Equinox. For those most familiar with PyTorch: for many use cases (sciML in particular), JAX has a much stronger compiler, more advanced autodiff, etc. And whilst JAX itself is akin to the torch.* namespace, libraries like Equinox are then akin to the torch.nn.* namespace. Because of its speed and features, right now JAX+Equinox is my favourite approach to numerical computing. So I'd love for some more people to try it. What do you think? submitted by /u/patrickkidger [link] [comments]  ( 9 min )
    [D] Distributed training on a local cluster
    I want to make use of a local rack for running both training and serving jobs. I have looked into using something like Kubeflow, but I have some questions. -Does Kubeflow offer a suitable solution for running tasks across multiple machines? (Either data parallel or model parallel tasks). -How does resource provisioning work with it? Is it able to automatically select the machines that best suits the resource requirements or does it require the user to select where to run the job? Is it able to scale vertically/horizontally? Thanks in advance. submitted by /u/omegalul3000 [link] [comments]  ( 9 min )
    [P] Hydralette: Simple but powerful configs based on dataclasses
    Hi r/ML, i want to share a little side project of mine: hydralette. I mainly built this for my own work but thought why not get some feedback and potentially make someone else's work a little easier as well. I think we all agree that having a flexible configuration is crucial to successful ML experimentation. There are a million python config libraries out there, some dedicated to configs like hydra and others that support configs as a convenience feature like transformers.HfArgumentParser. So why did I decide to write yet another library? First off, I can say that I never really liked the way huggingface handles configs. All options are on a single level with tons of dependencies between them, some only taking effect if a combination of others is given. General approach to configs asid…  ( 11 min )
  • Open

    Build a generative AI-based content moderation solution on Amazon SageMaker JumpStart
    In this post, we introduce a novel method to perform content moderation on image data with multi-modal pre-training and a large language model (LLM). With multi-modal pre-training, we can directly query the image content based on a set of questions of interest and the model will be able to answer these questions. This enables users to chat with the image to confirm if it contains any inappropriate content that violates the organization’s policies. We use the powerful generating capability of LLMs to generate the final decision including safe/unsafe labels and category type. In addition, by designing a prompt, we can make an LLM generate the defined output format, such as JSON format. The designed prompt template allows the LLM to determine if the image violates the moderation policy, identify the category of violation, explain why, and provide the output in a structured JSON format.  ( 13 min )
    How Carrier predicts HVAC faults using AWS Glue and Amazon SageMaker
    In this post, we show how the Carrier and AWS teams applied ML to predict faults across large fleets of equipment using a single model. We first highlight how we use AWS Glue for highly parallel data processing. We then discuss how Amazon SageMaker helps us with feature engineering and building a scalable supervised deep learning model.  ( 10 min )
    Optimize deployment cost of Amazon SageMaker JumpStart foundation models with Amazon SageMaker asynchronous endpoints
    In this post, we target these situations and solve the problem of risking high costs by deploying large foundation models to Amazon SageMaker asynchronous endpoints from Amazon SageMaker JumpStart. This can help cut costs of the architecture, allowing the endpoint to run only when requests are in the queue and for a short time-to-live, while scaling down to zero when no requests are waiting to be serviced. This sounds great for a lot of use cases; however, an endpoint that has scaled down to zero will introduce a cold start time before being able to serve inferences.  ( 10 min )
  • Open

    Distal Adversarial Examples Against Neural Networks in PyTorch
    Out-of-distribution examples are images that are cearly irrelevant to the task at hand. Unfortunately, deep neural networks frequently assign random labels with high confidence to such examples. In this article, I want to discuss an adversarial way of computing high-confidence out-of-distribution examples, so-called distal adversarial examples, and how confidence-calibrated adversarial training handles them. The post Distal Adversarial Examples Against Neural Networks in PyTorch appeared first on David Stutz.  ( 5 min )
  • Open

    Rethinking trust in direct messages in the AI era
    Microsoft researchers are proposing a new way to ensure greater trust and accountability in email, texts, direct messages on social platforms, even phone calls, to help mitigate sophisticated threats from AI-related scams and fraud. The post Rethinking trust in direct messages in the AI era appeared first on Microsoft Research.  ( 14 min )
  • Open

    The Halo Effect: AI Deep Dives Into Coral Reef Conservation
    With coral reefs in rapid decline across the globe, researchers from the University of Hawaii at Mānoa have pioneered an AI-based surveying tool that monitors reef health from the sky. Using deep learning models and high-resolution satellite imagery powered by NVIDIA GPUs, the researchers have developed a new method for spotting and tracking coral reef Read article >  ( 6 min )
    A Perfect Pair: adidas and Covision Media Use AI, NVIDIA RTX to Create Photorealistic 3D Content
    Creating 3D scans of physical products can be time consuming. Businesses often use traditional methods, like photogrammetry-based apps and scanners, but these can take hours or even days. They also don’t always provide the 3D quality and level of detail needed to make models look realistic in all its applications. Italy-based startup Covision Media is Read article >  ( 7 min )

  • Open

    AI is a Looming Damnation
    submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    Natural Language Processing Question
    Hello, I am learning about natural language processing now. Technically, is this a way for a computer to input language of a person and then convert it into machine code (0s and 1s)? Or, is this a way to turn human language into some computer language like Python, and then turn into machine code as a second step? I am assuming that NLP has only just recently become widely used (like in Chat GPT). Was it a huge jump to go from a machine understanding a computer programming language like Python to a machine understanding ordinary human language? Why was it so much more difficult to train computers to understand the later? Thanks! submitted by /u/NoahsArkJP [link] [comments]  ( 9 min )
    Can someone tell me where I can get Runway Gen-2 code? I tried Github but found nothing
    Title submitted by /u/ICWiener6666 [link] [comments]  ( 9 min )
    Help with finding a tool for 3d image effects.
    Hello all - I'm looking to track down a tool that was able to create a zoom effect that looks three dimensional, example below. https://www.instagram.com/reel/CwLB1XsNK0X/?igshid=MmU2YjMzNjRlOQ== I've searched my usual spots for some different image editing tools and looked at some video ones as well, but I can't quite figure it out. Anyone familiar with a tool that could do something like that? Thanks in advance. submitted by /u/Lys0L [link] [comments]  ( 9 min )
    Is this company Legit? Any more info on the early access release of this AI?
    Sounds like the stuff I somgwrite about submitted by /u/Niu_Davinci [link] [comments]  ( 9 min )
    Can AI Writing Boost Your Mood and Mind? My Personal experience.
    Have you ever wondered if AI writing can make you feel better in your head? I would like to discuss about how AI writing can put a smile on your face as it is my personal experience. 1. Stress-Free Writing Writing can be stressful, especially when you're not sure where to start. AI writing tools can be your stress-busters. They help you begin by giving you ideas and suggestions. So, no more staring at a blank screen in frustration! 2. Beating the Writer's Blues We all know that feeling when words just won't flow. AI can be your brainstorm buddy. It tosses out ideas like confetti at a party, sparking your creativity when you need it most. Goodbye, writer's block! 3. Making Your Writing Shine Typos and messy sentences can be a downer. AI can be your proofreader, catching those pesky er…  ( 10 min )
    One-Minute Daily AI News 9/3/2023
    Amazon India is developing a generative artificial intelligence (AI) tool called SahAI (help/assist) for its business partners to help them with the backend of any particular product.[1] A robot moves a toy package of butter around a table in the Intelligent Robotics and Vision Lab at The University of Texas at Dallas. With every push, the robot is learning to recognize the object through a new system developed by a team of UT Dallas computer scientists.[2] “What is my purpose?” – “You pass butter”. Mustafa Suleyman, Google DeepMind’s co-founder and chief executive of Inflection AI, told the Financial Times that the US should use their chip leadership to enforce minimum global standards for the use of AI.[3] Model who never ages: Noonoouri becomes first digital artist to be signed by Warner Music.[4] Sources: [1] https://www.thehindu.com/sci-tech/technology/amazon-working-on-a-generative-ai-to-help-small-businesses-in-india/article67255325.ece [2] https://www.nanowerk.com/news2/robotics/newsid=63572.php [3] https://www.finextra.com/newsarticle/42878/google-deepmind-co-founder-argues-us-should-set-ai-global-standards---ft [4] https://www.thenationalnews.com/arts-culture/music-stage/2023/09/02/model-who-never-ages-noonoouri-becomes-first-digital-artist-to-be-signed-by-warner-music/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    An audiobook entirely created from A.I.
    This story, it's narrator and even the cover art are all made by Artificial Intelligence. The only human contribution was adding the Text to the book cover and the prompts used to produce the story. https://youtu.be/tZgq9N9RCo0 submitted by /u/BermaidMutter [link] [comments]  ( 9 min )
    AI-Generated Voice Deepfakes Pose New Threat to Bank Security
    Scammers are now using AI to create realistic voice deepfakes, aiming to trick people into transferring money. By mimicking real customer voices, this new type of voice fraud attempts to exploit bank security systems and deceive call center agents. To make sure you're updated about the latest AI trends, look here first. Increasing prevalence and sophistication of voice frauds A rise in AI-generated voice frauds has been noted this year, with one major case featuring an investor in Florida whose voice was synthetically duplicated to deceive his bank. Voice authentication vendor Nuance detected its first successful deepfake attack on a financial services client late last year. These scams are facilitated by the wide availability of voice samples online, coupled with the growth of AI capabilities and hackers' access to stolen bank account details. Defending against evolving AI threats Currently, only a small percentage of fraud calls to large financial companies are AI-generated. Most attacks have targeted credit card service call centers. Fraudsters are advancing their techniques, now able to convert speech to a specific target's voice in real-time using advanced AI systems like Microsoft's VALL-E. With most of these security measures focusing on call centers and automated systems, individual calls to high-ranking officials remain a vulnerability. (source) P.S. If you like this kind of analysis, I write a free newsletter that keeps you updated with the most relevant news and research in AI. Join professionals from Google, Meta, and OpenAI who are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    [D] What is the difference between self-taught learning and self-supervised learning?
    I came across a paper by Andrew Ng "https://ai.stanford.edu/~hllee/icml07-selftaughtlearning.pdf" ,with title " Self-taught Learning: Transfer Learning from Unlabeled Data " I am not an expert on this topic, but I feel it is really close to what SimCLR or MoCO are trying to do. Can someone provide guidance on what different it is between self-taught learning and self-supervised learning? submitted by /u/AaronSpalding [link] [comments]  ( 9 min )
    [R] A brain-inspired algorithm that mitigates catastrophic forgetting of artificial and spiking neural networks with low computational cost - Chinese Academy of Sciences 2023
    Paper: https://www.science.org/doi/10.1126/sciadv.adi2947#abstract Code: https://zenodo.org/record/8037309 Abstract: Neuromodulators in the brain act globally at many forms of synaptic plasticity, represented as metaplasticity, which is rarely considered by existing spiking (SNNs) and nonspiking artificial neural networks (ANNs). Here, we report an efficient brain-inspired computing algorithm for SNNs and ANNs, referred to here as neuromodulation-assisted credit assignment (NACA), which uses expectation signals to induce defined levels of neuromodulators to selective synapses, whereby the long-term synaptic potentiation and depression are modified in a nonlinear manner depending on the neuromodulator level. The NACA algorithm achieved high recognition accuracy with substantially reduced computational cost in learning spatial and temporal classification tasks. Notably, NACA was also verified as efficient for learning five different class continuous learning tasks with varying degrees of complexity, exhibiting a markedly mitigated catastrophic forgetting at low computational cost. Mapping synaptic weight changes showed that these benefits could be explained by the sparse and targeted synaptic modifications attributed to expectation-based global neuromodulation. https://preview.redd.it/5lcx3sn8ramb1.jpg?width=711&format=pjpg&auto=webp&s=4431b81708bb9ab98e6351f4b979897ad8244ed9 https://preview.redd.it/vgsuqsn8ramb1.jpg?width=718&format=pjpg&auto=webp&s=f0602185fcb0dc6ec29308f77f1db77a4f4a562d https://preview.redd.it/hpfuftn8ramb1.jpg?width=709&format=pjpg&auto=webp&s=545f4fab3033cb68637052e7ff2c4775a12a7b99 https://preview.redd.it/7plm0tn8ramb1.jpg?width=714&format=pjpg&auto=webp&s=b138b1c43a2078297b69c09d26de013865629e77 https://preview.redd.it/uc6tnrn8ramb1.jpg?width=703&format=pjpg&auto=webp&s=fcf747b2515fbf6e78b1ef7aa66ce9ca4d223cd3 submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Faster, long range transformer [R]
    Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive -- not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length. In this video, we walk through the ColT5 paper and explain what is T5, longT5, UL2 and PEGASUS, then discuss how ColT5 has advantage over previous methods for few-shot and 1-shot tasks. https://youtu.be/8KCQQtXje2g?si=ecbvnFPlhGP01aOt submitted by /u/MRMohebian [link] [comments]  ( 9 min )
    [R] YaRN: Efficient Context Window Extension of Large Language Models - Nous Research 2023 - Open source allows context windows of up to 128k!
    Paper: https://arxiv.org/abs/2309.00071 Github: https://github.com/jquesnelle/yarn Very informative Reddit discussion: https://www.reddit.com/r/LocalLLaMA/comments/166jik4/128k_context_llama_2_finetunes_using_yarn/?utm_source=share&utm_medium=web2x&context=3 Twitter: https://twitter.com/EnricoShippole/status/1697317625116742119?s=20 Abstract: Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequence length they were trained on. We present YaRN (Yet another RoPE extensioN method), a compute-efficient method to extend the context window of such models, requiring 10x less tokens and 2.5x less training steps than previous methods. Using YaRN, we show that LLaMA models can effectively utilize and extrapolate to context lengths much longer than their original pre-training would allow, while also surpassing previous the state-of-the-art at context window extension. In addition, we demonstrate that YaRN exhibits the capability to extrapolate beyond the limited context of a fine-tuning dataset. We publish the checkpoints of Llama 2 7B/13B fine-tuned using YaRN with 64k and 128k context windows at https://github.com/jquesnelle/yarn . https://preview.redd.it/tnovsbpjiamb1.jpg?width=1354&format=pjpg&auto=webp&s=ce098b3071285f9f64d99312a98999de8b625bfe https://preview.redd.it/j10sicpjiamb1.jpg?width=997&format=pjpg&auto=webp&s=95bbc6d70759ef7ccdf6bccee0c2a2f98ebda52b https://preview.redd.it/ve710dpjiamb1.jpg?width=1380&format=pjpg&auto=webp&s=05f53117bcf648e330fa6ac148746484dca9fb1b ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] DenseDiffusion: The Game-changing, Training-free Technique in Text-to-Image Generation
    Overcoming present challenges in text-to-image models, DenseDiffusion is the latest advancement ensuring enhanced image quality based on scene descriptions. Developed specifically to handle complex captions, it brings a new era in dense captioning. https://preview.redd.it/v5oa5suwfamb1.png?width=2000&format=png&auto=webp&s=17fbcc702ee21a41cb356a7d0e38d710a8c048c3 If you want to stay on top of the latest trends and insights in AI, look here first. Why is it noteworthy? It addresses the issues with existing techniques where users face inconsistencies when dictating the arrangement of elements within generated images using textual prompts. DenseDiffusion is training-free, unlike existing methods like "Make-aScene" and "Latent Diffusion Models," which are computationally intensive and r…  ( 9 min )
    [D] video data in image classification
    Let's say your training a simple CNN for a classification problem. An example would be a model that is supposed to decide if a person is male or female based on facial images. What is your experience regarding image sequences from videos in the training datasets? My intuition is, that the added information to the dataset from one video isn't proportional to the number of frames. The network probably can't learn much more from 30 frames with little variation in comparison to a single image (at least if you use augmentations). What do you think about this? Or do you even know any research in the direction of this question? submitted by /u/seba07 [link] [comments]  ( 9 min )
    [D] Strongest LLM for Writers/Editors
    Hey all, I'm a screenwriter that's curious about ML/AI tech and its applications to my industry. I'm wondering what the current best product is for writers and editors. Specifically, I'm curious if there's a product that can "edit" longform text - say, to trim a screenplay down from 140 pages to 120, while retaining style, plot, and narrative intent. Are there any products like that? Forgive me if this is too basic; I've only dabbled in ChatGPT and MidJourney to see what the fuss is about. Thanks in advance! submitted by /u/cesrep [link] [comments]  ( 9 min )
    [P] We're building the first LLM marketplace to connect developers with teams, investors, and projects
    There is so much going on right now in AI and machine learning. But there isn't a concise place to find experts, teams, and amazing projects all in one place. That is why we are building Bazaar, the first ever LLM marketplace. We will be inviting slowly making sure we have enough members on each side of the marketplace. https://www.llmbazaar.com/ submitted by /u/husky_misconception [link] [comments]  ( 9 min )
    [Project] Should i use the compile() function when using a custom trainer class in tensorflow?
    I'm writing a neural network for super resolution but it's one of my first projects and I didn't really understand what compile() is used for. I specify the optimizer, the loss and the accuracy metrics in the trainer class and then I just call my train method on my model. Should I still use the compile function? I'm following this template for the project structure https://github.com/jinh0park/Tensorflow-2.0-Project-Template/tree/master submitted by /u/petrogass [link] [comments]  ( 9 min )
    [R] Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis
    Project page: https://pmh9960.github.io/research/GCDP/ https://i.redd.it/9uz2wt3ba9mb1.gif submitted by /u/yeolj0o [link] [comments]  ( 9 min )
    [Discussion] Segmentation Suggestions for Structured (and Deeply Nested) Bulleted Documents
    My question is more focused on the pre-processing side, rather than the training side of things. I have a local RAG Q&A pipeline set up for personal documents (local regulations, technical manuals, stuff like that), and I'm looking for ways to improve it. All the documents I'm working with are consistently structured, with nested bullets of varying depth making up most of the structure. So far I've been manually writing/tweaking a python script I wrote to recursively extract the nested bullets and duplicate their hierarchy parents' content for each of the inner-most bullets, that way each bullet has all the contextual content it needs to be valuable in a vacuum. So something like: (a) 1. A. B. Would turn into: (a) + 1. + A. (a) + 1. + B. This works well in the sense that my LLM does a wonderful job answering my questions and citing the right sources, but the lion's share of my work goes into the tweaking of my parser scripts, or creating new ones entirely. I've played around with semantic segmentation via embedding models, but it doesn't really work here since I'm trying to retain the nested structure of the document for citation accuracy. Does anyone have any ideas for ots solutions that address this kind of thing? I can't be the only person who has run into this type of problem, but I've been having a really hard time finding relevant libraries/software that can even get me 80% of the way there. Also, I'm totally happy to hear what you guys have done and how's it's worked out/what walls you've hit! Edit: I suppose I should have included my current attempt as well, so it's doesn't look like I'm treating this subreddit like Google lol https://gist.github.com/apettina/76de292d6d24ed3d0128b87847706b18 submitted by /u/RedditAppSucksDicks [link] [comments]  ( 10 min )
    [Discussion] How to implement Data Contracts generically? Seeking advice from data contract users.
    Hey folks, it's me the dlt builder again. I have questions about data contracts! Schema evolution, where the schema of the destination evolves based on incoming data is nice for ingesting transactional data. However, there are scenarios where we might not want this automatic evolution. For example, when other parts of our infrastructure require a fixed schema or when we want to store only data that conforms to the current schema. This is where a data contract comes into play. Our plan is to implement a straightforward version of this concept initially. We're considering introducing settings on the pipeline to control schema evolution, and here are some modes we're thinking about: Evolve (Default): The current behavior where the schema adapts to incoming data. Freeze-and-Trim: Freez…  ( 10 min )
    [P] 🤵🔥 Classy-Fire 🔥🤵 - pretrained text classification using LLM APIs (github.com/microsoft)
    Classy-fire is a pretrained multiclass text classification approach that leverages Azure OpenAI's LLM APIs using clever parameter tuning and prompting for classification. Why? Tired of having to beg your LLM to pick from a set of options / actions? Tired of working hard on cleaning and parsing its responses to trigger a flow? Struggling to strip unhelpful prefixes (such as "Sure! " or "I am just a language model!")? Having to wait on retries in cases of unexpected outputs? Getting random responses on the same query? Need a "quick and dirty" text classifier? Don't have enough training data? submitted by /u/shayben [link] [comments]  ( 9 min )
    [D] Are there any projects working with large compute clusters looking for volunteers?
    I've been an ML/software engineer for a bit over 7 years now, and am looking for a new job. It seems like most of the job postings I see around want experience with large compute clusters, but my work has always been in compute-restricted domains (robotics, on-prem deployments, etc.). I'm looking broaden my skillset and get some experience with distributed computing. Does anyone know of open-source or otherwise public projects that work with compute clusters like this that are looking for volunteers? I'm happy to put aside an hour or so a day to work on an interesting project. submitted by /u/Flag_Red [link] [comments]  ( 9 min )
    [D] - Two objections to Iris van Rooij's paper saying that it is provably intractable to simulate human intelligence via any machine learning algorithm that samples from human actions.
    https://psyarxiv.com/4cbuv/ The short of the paper is they show that an AI algorithm that can only learn via sampling from human action is unable to tractably simulate human behavior. I have seen papers like this one by u/alcanthro questioning the validity of the result, but I want to point out two objections to the paper that stand even if the result is true. ​ 1 - It only seems to apply for AIs trained to mimic humans via sampling human behavior: The paper assumes the AI is trained via an arbitrary machine learning algo M that samples from possible human behaviors in given situations. This matches pretty well to how a lot of LLMs are pretrained (guess the next token), but doesn't seem to apply to any sort of reinforcement learning, since in those situations you are not training the …  ( 11 min )
    [P] ReAct: "Recurrence for Adaptive Computation" can lead to OOD length-extrapolation
    This was a small project I was working upon which adds a recurrent prior to attention-based models. This allows integrating an adaptive-computation mechanism, leading to much better length-extrapolation capabilities (compared to vanilla transformers). On some tasks, I'm able to OOD extrapolate to quite an appreciable extent! Its also (relatively) quite parallelizable with slightly different training regimes - thus, hopefully being scalable as well. Being lightweight, it might be useful for inferencing as it saves on memory (trading off compute instead). It's interesting to think that MHSA might contain an implicit inductive bias that prevents extrapolation. Replacing that with other variants helps a lot - I go in detail in the writeup! Twitter summary: https://twitter.com/awesome_ruler_/status/1698668965612917112?s=20 Writeup/Blogpost: https://dripfeedofideas.notion.site/dripfeedofideas/ReAct-bef052956a0d45f29fb5a5383e7d737d GitHub repo: https://github.com/neel04/ReAct submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 9 min )
    [D] Current opinions on the information bottleneck principle for neural networks?
    A while back, the IB principle (https://arxiv.org/abs/1503.02406) made a few waves as a promising framework to understand/study deep neural networks. But I recall a series of follow up works (notably https://openreview.net/forum?id=ry_WPG-A-) that called a lot of the results into question, and (I think?) people drifted away from it. I saw this recent paper (https://arxiv.org/abs/2304.09355) on the IB and self-supervised learning, and it got me wondering what the current views are as to how useful/accurate the IB view of deep learning is? submitted by /u/Tea_Pearce [link] [comments]  ( 9 min )
    [Project][Discussion] What could I use to create a UI like AIChain?
    I was looking over this paper https://arxiv.org/abs/2110.01691 called AIChains that deals with an interactive chaining method to interact with LLMs. I could not find an associated codebase with that paper. If I wanted to create a similar UI like theirs anything you would recommend? More specifically, if I want to replicate the paper in 3 months full time (as a student with some experience in ML), what would be the best approach to the UI part of the paper. What if deployment is a concern? I was intially thinking of simple python frameworks like PySimpleGui, or maybe something more comprehensive like PyQt. I am rather unfamiliar with more common web frontend frameworks, but if there are suggestions that make such a Graph/Diagram based User interface easy to implement, I am open to them. submitted by /u/BasisCompetitive6275 [link] [comments]  ( 9 min )
    [D] how to learn Stochastic Differential Equations for diffusion model?
    There have many blogs and papers disscuss SDE for diffusion model: Stochastic Differential Equations and Diffusion Models https://www.vanillabug.com/posts/sde/ Perspectives on diffusion https://sander.ai/2023/07/20/perspectives.html On the Mathematics of Diffusion Models https://arxiv.org/abs/2301.11108 But i can't find blog or book to explain Stochastic Differential Equations, it seems complex, even after i have learned Calculus and Ordinary Differential Equations and Partial Differential Equations, i still can't understand SDE, especially the SDE Perspective on diffusion. So Do you know some blogs or books explain SDE intuitive like betterexplained.com/ and mathsisfun.com/ ? submitted by /u/ghosthamlet [link] [comments]  ( 9 min )
    [D] How are Mixture Of Expert models trained in conjunction with Transformers?
    How MoEs (sparsely gated ones) are trained appear to be rather opaque from looking at literature (e.g. GLAM and similar papers). From my intuition it would make sense that it works either by: Each expert being trained on a subset of data (the data they are supposed to have expertise in) to predict a token given a previous token, or to predict a token given a contextual embedding. This would mean the expert MLPs are frozen, and the only thing concerning the experts that we train with the transformer is the gating mechanism. or The experts are trained in the same training loop as the transformer (e.g. backprop over the whole network), but that each of the experts are only trained on a subset of the data corresponding to their expertise (e.g. as we perform the training loop and we run upon data from our math dataset, then we backprop through the math expert mlp) ​ Could anyone help me resolve my confusion and point me in the right direction for how these are trained? Thanks! submitted by /u/SorasNobody [link] [comments]  ( 9 min )
    [D] (Advice) Remote work in ML/DL/Data Science
    I'm from India and I've started learning and building my portfolio in Machine Learning/Deep learning. Currently, I'm doing "Practical Deep learning using fast.ai and pytorch" course. In my university, there are not a lot of companies visiting for campus placement this year so I decided to go on the offcampus job hunt. I'm a final year student (in masters but bachelors was unrelated to CS) and no work experience. I have further personal goals for which I'd need a steady and good income. I decided if I could get a remote job it would be really beneficial for me as my living costs would be saved and I'll be paid much more than what India offers freshers(since I'll be paid in dollars or euros). However, I need advice in various domains: 1. Should I focus on one of ML/DL/ Data science or multiple ? 2. Any resources that could help me learn ? 3. Projects that help me stand out from the crowd? 4. Where can I start looking for remote work(websites, etc)? 5. Any other personal advice is appreciated! Thank you for taking the time to read my post :) submitted by /u/Lazy_Guidance_5151 [link] [comments]  ( 9 min )
    [D] Finetune pretrained ViT
    Hello everyone, In deep learning finetuning pre-trained model was performed by taking some pretrained models like resnet, vgg and unfreezing some of it's final layers. Is it the same when finetuning pretrained ViT models? Or do we have to take pretrained ViT and train all the parameters on our own data ? On this tutorials https://theaisummer.com/hugging-face-vit/, they have not freezed any pretrained layers. submitted by /u/Bishwa12 [link] [comments]  ( 9 min )
  • Open

    NN underperforming greedy algorithms
    So apparently NNs may not outperform simple greedy algorithms in some combinatorial optimization problems Never thought could be the case. https://arxiv.org/pdf/2206.13211.pdf… Modern graph neural networks do worse than classical greedy algorithms in solving combinatorial optimization problems like Maximum Independent Set. https://arxiv.org/pdf/2210.00623.pdf… Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-Cut submitted by /u/vniversvs_ [link] [comments]  ( 9 min )
    Introducing Refact Code LLM: 1.6B State-of-the-Art LLM for Code that Reaches 32% HumanEval
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Predicting Optimal Temperature in The Transmission System using ML
    submitted by /u/Antique-human6894 [link] [comments]  ( 9 min )
  • Open

    "ChessGPT: Bridging Policy Learning and Language Modeling", Feng et al 2023
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Looking for open PhD positions
    Hi all, I have just completed my MSc and am looking for open PhD positions (preferably funded) in RL to join. My masters thesis was on Hierarchical RL and skill discovery, so that’s the domain am mostly interested in since I have spent quite some time researching it but also open to other interesting avenues. If there are any such positions available at your workplace/lab please let me know. Thanks….. submitted by /u/FreakedoutNeurotic98 [link] [comments]  ( 9 min )
    The reason for using a policy based learning method
    I am reading Sutton's RL chapter on Policy Gradients (13.1) and came across the following paragraph. Can someone please explain it to me - " Finally, we note that the choice of policy parameterization is sometimes a good way of injecting prior knowledge about the desired form of the policy into the reinforcement learning system. This is often the most important reason for using a policy-based learning method. ". Is he referring to some kind of Bayesian technique? I'd highly appreciate some examples here. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    Continuation of Key Papers in DRL from OpenAI Spinning UP
    Hey, I've been going through the papers curated by people behind OpenAI Spinning Up and I've recently started thinking what the list would look like in 2023 if OpenAI hadn't abandoned it. Do you folks have any suggestions for DRL papers from 2019, 2020, …, up to now? submitted by /u/spoiled-mylk [link] [comments]  ( 9 min )
    Reinforcement learning Rivals of Aether
    i want to create an ai for Rivals of Aether to see how far it could get in abyss mode and if it could beat 3 9th level cpus on a team. i have no idea how to do this. I was thiking for abyss mode, it could get rewards for finishing waves, and get more reward for doing them with minimal damage. ​ submitted by /u/Additional_Ad9093 [link] [comments]  ( 9 min )
  • Open

    Jordan normal form: 1’s above or below diagonal?
    Given a square complex matrix A, the Jordan normal form of A is a matrix J such that and J has a particular form. The eigenvalues of A are along the diagonal of J, and the elements above the diagonal are 0s or 1s. There’s a particular pattern to the 1s, giving the matrix J […] Jordan normal form: 1’s above or below diagonal? first appeared on John D. Cook.  ( 6 min )
    Eigenvectors of the DFT matrix
    When is the discrete Fourier transform of a vector proportional to the original vector? And when that happens, what is the proportionality constant? In more formal language, what can we say about the eigenvectors and eigenvalues of the DFT matrix? Setup I mentioned in the previous post that Mathematica’s default convention for defining the DFT […] Eigenvectors of the DFT matrix first appeared on John D. Cook.  ( 6 min )
  • Open

    NVIDIA CEO Meets with India Prime Minister Narendra Modi
    Underscoring NVIDIA’s growing relationship with the global technology superpower, Indian Prime Minister Narendra Modi met with NVIDIA founder and CEO Jensen Huang Monday evening. The meeting at 7 Lok Kalyan Marg — as the Prime Minister’s official residence in New Delhi is known — comes as Modi prepares to host a gathering of leaders from Read article >  ( 5 min )

  • Open

    [R] How susceptible are LLMs to Logical Fallacies?
    paper https://arxiv.org/abs/2308.09853 abstract. This paper investigates the rational thinking capability of Large Language Models (LLMs) in multi-round argumentative debates by exploring the impact of fallacious arguments on their logical reasoning performance. More specifically, we present Logic Competence Measurement Benchmark (LOGICOM), a diagnostic benchmark to assess the robustness of LLMs against logical fallacies. LOGICOM involves two agents: a persuader and a debater engaging in a multi-round debate on a controversial topic, where the persuader tries to convince the debater of the correctness of its claim. First, LOGICOM assesses the potential of LLMs to change their opinions through reasoning. Then, it evaluates the debater’s performance in logical reasoning by contrasting the scenario where the persuader employs logical fallacies against one where logical reasoning is used. We use this benchmark to evaluate the performance of GPT-3.5 and GPT-4 using a dataset containing controversial topics, claims, and reasons supporting them. Our findings indicate that both GPT-3.5 and GPT-4 can adjust their opinion through reasoning. However, when presented with logical fallacies, GPT-3.5 and GPT-4 are erroneously convinced 41% and 69% more often, respectively, compared to when logical reasoning is used. Finally, we introduce a new dataset containing over 5k pairs of logical vs. fallacious arguments. The source code and dataset of this work are made publicly available. GPT3.5 vulnerable to false information generated by itself! submitted by /u/Amir-AI [link] [comments]  ( 9 min )
    [Discussion] What was your biggest oops with a model or analysis that made it (or almost made it) into production?
    I'm asking because it seems like when I review other people's work I very regularly catch a tiny coding misstep that has HUGE downstream implications. I'm sure my own work is not exempt either. Some examples: "At this step you're saying you encode responders as 1 and non-responders as 0 but you actually did it the other way around." "That groupby statement isn't doing what you think it's doing." "When you created your target variable by labeling people with this ratio >= 30%, you accidentally failed to capture a ton of actual responders, because the floating-point arithmetic used to derive this column is calculating people with actual values of 0.30 as 0.2999999999999998." Come on guys, let's hear it. submitted by /u/WartimeHotTot [link] [comments]  ( 9 min )
    [R] Meta's DINOv2 and FACET sets the bar in computer vision model fairness
    Meta has recently unveiled DINOv2, its cutting-edge computer vision model, and FACET, a comprehensive benchmark to ensure AI fairness. These developments promise improved automation and better inclusivity in the AI sector. If you want to stay on top of the latest trends and insights in AI, look here first. https://i.redd.it/jeojm1qew3mb1.gif DINOv2 for advanced visual tasks Meta has made the powerful DINOv2 model available under the Apache 2.0 license, employing self-supervised learning to enhance image segmentation and depth estimation. This broader use model encourages further innovation and practical application in the computer vision community, driving progress in the AI industry. FACET for enhanced AI fairness Given the inherent difficulty and risks in ensuring fairness in computer vision, Meta introduced FACET. FACET has been developed to benchmark fairness across computer vision models performing tasks such as detection or classification, considering a wide array of demographic attributes. This revolutionary tool enables a better understanding of potential biases in AI models, helping to address fairness and robustness concerns. Wider implications Preliminary studies indicate performance disparities across some demographic groups within computer vision models. FACET allows researchers to track these divergences and monitor the implementation of corrective measures. Meta actively encourages researchers to use FACET for fairness benchmarking in other visual/multimodal tasks. For instance, the DINOv2 model's performance was analyzed with FACET — facilitating insights into potential biases. (source) P.S. If you like such analysis, I write a free newsletter tracking significant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [P][D] VAE using one-hot-encoding input, problem with optimising and getting good results
    Introduction to the problem I will provide you with link to github code so you can see both input dataset, how my onehot encoding and decoding works and implementation of VAE, with current reconstruction_loss and kl_loss fuctions that make my loss function. dataset - amp.csv onehotencoding -- tools.py main code -- VAE-onehot-testing.ipynb https://github.com/aronta/Master-thesis-Generating-de-novo-peptides-using-variational-autoencoder-model/blob/main/VAE-onehot-testing.ipynb Firstly, just to explain dataset. I have sequences of peptides represented with onehot encoded sequences which I am using as input for my model. Current VAE model, both encoder and deocder are based on LSTM layer as a main way for this VAE to learn connections between inputed sequences and to make sense of it all. Main Issue Issue is that the latent space im getting doesn't look good no matter what i do. (pictures of plots are on link). So i have tried scalling kl loss (and also warming it up -- because many papers say its a good way) but it doesn't change the end result. Maybe there is problem in the implementation of the VAE, i am realy not sure. The main goal would be (like in all VAE implementations) generating new sequences from latent spaces that make sense, opposed to the current outputs that im getting. My guess There is a problem with optimizing loss function, but i could be completely wrong (maybe the model is wrong for the input I have, or onehotencoding isn't even a good way to represent data entering LSTM layer). submitted by /u/Yupgrade [link] [comments]  ( 9 min )
    "[discussion]" Pleae help. Started with ML, MERN and Contributed but conflicted what to continue on..
    I've been learning MERN with a course (charging INR 6k) for last 3 weeks, won two hackathons and Contributed to OS projects. Now just 1 week ago I saw a remote ML job Profile that required OS Contribution to apply.I looked through their docs, learnt python, tensor flow basics and I Contributed to their Tensor flow and Paddle module and got 3 - 4 PR merged(Enough to apply) . Now I'm confused what to continue with, should I do both or do it one by One. I'm a recent graduate so need a job ASAP but I can give maximum time of the day to study. Please can someone give some advice so I can make my decision, as I'm unable to leave either TLDR; Learnt both ML and MERN, Contributed and now confused what to carry on with as I need a job asap. submitted by /u/Sinofdracry [link] [comments]  ( 9 min )
    [D] Compute percentage of languages present in a document
    Hi guys, I'm trying to compute the percentage of each language appearing in a document. My current use cases including two known languages and a set of documents which have the two languages mixed in (code switching, due to translation error). I'm training an ML model to make the output monolingual (leaned towards a designated language), so I need a reliable measure to estimate whether the ML model is making progress or not. Currently, I use lingua with the `compute_language_confidence_values()` function but the prediction is quite poor. For example, given a piece of text in Japanese and English: from lingua import Language, LanguageDetectorBuilder languages = [Language.ENGLISH, Language.JAPANESE] detector = LanguageDetectorBuilder.from_languages(*languages).build() detector.compute_language_confidence_values("わかりません hey do you understand me hey oh really") >>> [ConfidenceValue(language=Language.ENGLISH, value=1.0), ConfidenceValue(language=Language.JAPANESE, value=0.0)] So it's not quite correct (should be 0.8-0.2 or something similar), does anyone have any advice ? Or are there better softwares out there ? submitted by /u/KarmaCut132 [link] [comments]  ( 9 min )
    [D] does anyone have any papers on getting LLMs to output perfect formats?
    Does anyone have any literature on how to constrain the output of an LLM to a specified format? I’ve self hacked a method to get LLAMA to output a json of perfect schema. I tried to find something out of the box but I couldn’t find anything, and so I home brewed it. Thinking of publishing a paper on this but I don’t want to republish something already written, so asking here first. Thanks! submitted by /u/SnooPears7079 [link] [comments]  ( 9 min )
    [D] does anyone have any papers on getting LLMs to output perfect formats?
    Does anyone have any literature on how to constrain the output of an LLM to a specified format? I’ve self hacked a method to get LLAMA to output a json of perfect schema. I tried to find something out of the box but I couldn’t find anything, and so I home brewed it. Thinking of publishing a paper on this but I don’t want to republish something already written, so asking here first. Thanks! submitted by /u/SnooPears7079 [link] [comments]  ( 9 min )
    [R] Requesting help finding labs/ professors on certain discipline.
    submitted by /u/Present-Ad-8531 [link] [comments]  ( 9 min )
    [P] I built a Chrome extension that adds a chatbot to every GitHub repository
    submitted by /u/jsonathan [link] [comments]  ( 9 min )
    [Discussion] How to setup TPU parallelism/FSDP with HuggingFace Transformers
    My Code (Colab Link) Hi! For the past few days, I've been trying to fine-tune a model using TPU parallelism / FSDP with a Kaggle TPU notebook. The reason I need to set up FSDP is because the model I'm using is very large (Openlm's open llama 3b v2). When I try to fine-tune it, I quickly run out of memory on the TPU. Linked above is my code, if anyone has any useful information I would greatly appreciate it! Thank you!! Edit: Also providing my code through text here: !pip install sentencepiece !pip install -U accelerate !pip install -U transformers !pip install cloud-tpu-client !pip install torch-xla !pip install pyarrow import torch import torch_xla import torch_xla.core.xla_model as xm from transformers import AutoTokenizer, AutoModelForCausalLM, Trainer, TrainingArguments import pand…  ( 10 min )
    [D] Various models and git repo examples to learn Algo Trading
    Can someone list down one dataset for Algo Trading Simulation or free API endpoint , and i will go over following Algorithms: Basic KNN (moving average) SVR other ML models if any LSTM other DL models if any any RNN model basically my mission is to write a paper at then end of months comparing all algorithms with candle stick patterns over different strategies ​ submitted by /u/reactwebdev [link] [comments]  ( 9 min )
    [D][R] How often do Masters students doing a thesis publish in top ML conferences in their program period of 2 years ?
    Just curious to know the thoughts of other Masters/PhD students, professors or others in academia or industry research about their experience with regard to the title. submitted by /u/V1bicycle [link] [comments]  ( 9 min )
    I pretrained 16 language models from scratch with different tokenizers to benchmark the difference. Here are the results. [Research]
    I'm the author of TokenMonster, a free open-source tokenizer and vocabulary builder. I've posted on here a few times as the project has evolved, and each time I'm asked "have you tested it on a language model?". Well here it is. I spent $8,000 from my own pocket, and 2 months, pretraining from scratch, finetuning and evaluating 16 language models. 12 small sized models of 91 - 124M parameters, and 4 medium sized models of 354M parameters. Here is the link to the full analysis. Summary of Findings Comparable (50256-strict-nocapcode) TokenMonster vocabularies perform better than both GPT-2 Tokenizer and tiktoken p50k_base on all metrics. Optimal vocabulary size is 32,000. Simpler vocabularies converge faster but do not necessarily produce better results when converged. Higher compre…  ( 10 min )
    [P] Comgra: A library for debugging and understanding neural networks
    I'm a machine learning engineer and researcher. I got fed up with how difficult it is to understand why neural networks behave the way they do, so i wrote a library to help with it. Comgra (computation graph analysis) is a library you can use with pytorch to extract all the tensor data you care about and visualize it graphically in a browser. This allows for a much more detailed analysis of what is happening than the usual approach of using tensorboard. You can go investigate tensors as training proceeds, drill down into individual neurons, inspect single data sets that are of special interest to you, track gradients, compare statistics between different training runs, and more. This tool has saved me a ton of time in my research by letting me check my hypotheses much more quickly than normal and by helping me understand how the different parts of my network really interact. I hope this tool can save other people just as much time as it did me. I'm also open for suggestions on how to improve it further: Since I'm already gathering and visualizing a lot of network information, adding more automated analysis would not be much extra work. submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] Reinforced Self-Training (ReST) for Language Modeling (Video Paper Discussion)
    https://youtu.be/V4dO2pyYGgs ReST uses a bootsrap-like method to produce its own extended dataset and trains on ever higher-quality subsets of it to improve its own reward. The method allows for re-using the same generated data multiple times and thus has an efficiency advantage with respect to Online RL techniques like PPO. ​ Paper: https://arxiv.org/abs/2308.08998 ​ Abstract: Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner. ​ Authors: Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas submitted by /u/ykilcher [link] [comments]  ( 9 min )
    MAML convergence with GAN [D]
    I had been exploring convergence properties of MAML and there are some recent works establishing convergence under certain conditions. I am trying to understand how this would play out with GAN's, I know that in general training is generally unstable and there are a lot of issues such as memorization and mode collapse under this regime, but I am looking for a theoretical result, for instance we know that GAN's converge under ideal conditions and we also know that MAML converges, can we make any comments on the convergence properties of GAN's when trained using MAML, ideally a neat trick to know if they will converge based on what we already know? The proof for MAML convergence is fairly complicated and I expect that a proof that has additional second order gradient terms and feedback loops will probably involve a lot of work and I am wondering if anyone could provide some sort of insight or intuition as to what such a result would look like? Thanks submitted by /u/ashblue21 [link] [comments]  ( 9 min )
    [D] Linear regression for time series data
    Problem: Given time series data of the last few years, one data point per day (eg price of a product or sales made this day). My job is to predict the next 7 days, ie. 7 scalars. Approach: Train one model for each time lag. The first model predicts tomorrow, the second model the day after tomorrow and so on (7 models in total). The features are the last prices of the last 7 days and some saisonal features (calendar week, price on this day last year and so on). Question: is there anything wrong with this approach? It doesn’t feel like the most elegant method to train 7 separate models. The problem with using a single model is, that this model must be able to predict 7 values of different points in time (and i don’t want to give the model input data of 7 days and let it predict all 7 scalars at once. The model should only use the features of a single day to predict this day). The 2 other to options I have considered are to train an autoregressive model (model just learns to predict the next day. To predict the day after tomorrow you give it its own prediction as input). Or to build a „time-lag“ feature, which tells the model how far in the future this datapoint lies. But this doesn’t make sense, because there is nothing like a weekly trend or so. What do you think? The autoregressive approach is elegant, but its implementation and maintenance is complex. submitted by /u/Individual-Cause-616 [link] [comments]  ( 9 min )
    [P] [D] Data augmentation using Stable diffusion
    I've written a post on how to use stable diffusion for data augmentation for object detection and segmentation. Please check it out and share some insights on how to evaluate these kind of tasks. https://medium.com/@kaushik.koneripalli/satellite-image-data-augmentation-using-stable-diffusion-for-object-detection-segmentation-8b1fe87b969 submitted by /u/perceptron333 [link] [comments]  ( 9 min )
    [P] Open-source star removal tool using Pix2Pix
    I created a open-source star removal tool "star2k13". Would love to hear some feedback . Here is link to the tool : Starrem2k13: Open source star removal tool (code2k13.github.io) Works on most operating systems and docker submitted by /u/Key_Education_2557 [link] [comments]  ( 9 min )
    [P] Coding LLaMA 2 from scratch in PyTorch, with step by step explanation of KV Cache, Grouped Query Attention, Rotary Positional Embedding, RMS Normalization, SwiGLU and much more!
    submitted by /u/hkproj_ [link] [comments]  ( 9 min )
  • Open

    Meta's DINOv2 and FACET sets the bar in computer vision model fairness
    Meta has recently unveiled DINOv2, its cutting-edge computer vision model, and FACET, a comprehensive benchmark to ensure AI fairness. These developments promise improved automation and better inclusivity in the AI sector. If you want to stay on top of the latest trends and insights in AI, look here first. https://i.redd.it/zg47br3xv3mb1.gif DINOv2 for advanced visual tasks Meta has made the powerful DINOv2 model available under the Apache 2.0 license, employing self-supervised learning to enhance image segmentation and depth estimation. This broader use model encourages further innovation and practical application in the computer vision community, driving progress in the AI industry. FACET for enhanced AI fairness Given the inherent difficulty and risks in ensuring fairness in computer vision, Meta introduced FACET. FACET has been developed to benchmark fairness across computer vision models performing tasks such as detection or classification, considering a wide array of demographic attributes. This revolutionary tool enables a better understanding of potential biases in AI models, helping to address fairness and robustness concerns. Wider implications Preliminary studies indicate performance disparities across some demographic groups within computer vision models. FACET allows researchers to track these divergences and monitor the implementation of corrective measures. Meta actively encourages researchers to use FACET for fairness benchmarking in other visual/multimodal tasks. For instance, the DINOv2 model's performance was analyzed with FACET — facilitating insights into potential biases. (source) P.S. If you like such analysis, I write a free newsletter tracking significant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    'Fluid' - (Modelscope image2video)
    submitted by /u/glenniszen [link] [comments]  ( 9 min )
    What....this is insane...
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    I'm literally speachless.. 8O
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    After Getting Banned in Schools, OpenAI Launches ChatGPT Tool for Teachers
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    Retro sci-fi trailer made with AI
    submitted by /u/filmcrux [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/2/2023
    SAG-AFTRA, the union for US actors, is moving towards a potential strike against video game publishers, it’s announced.[1] Russia builds MSU-270 supercomputer for AI and HPC research.[2] Chuck Schumer has announced that his office will be meeting with top players in the artificial intelligence field later this month. Invited to the upcoming summit are tech megabillionaire Elon Musk, his one-time hypothetical sparring partner Meta CEO Mark Zuckerberg, OpenAI CEO Sam Altman, Google CEO Sundar Pichai, NVIDIA President Jensen Huang, and Alex Karpy, CEO of defense contractor creep Palantir.[3] Google expands AI compute offerings, partnership with Nvidia and more.[4] Sources: [1] https://www.videogameschronicle.com/news/actors-union-sag-aftra-could-launch-video-game-strikes-over-wages-and-ai/ [2] https://www.tomshardware.com/news/russian-400-petaflops-supercomputer-for-ai-comes-online [3] https://gizmodo.com/chuck-schumer-elon-musk-mark-zuckerberg-palantir-nvidia-1850788302 [4] https://www.itworldcanada.com/article/google-expands-ai-compute-offerings-partnership-with-nvidia-and-more/545625 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    The Puzzle Created by GPT-4 That Even GPT-4 Can't Solve, Yet Humans Did! First Challenge Revealed.
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
  • Open

    Expert systems and RL
    I'm interested in learning more about how expert systems and explicit knowledge injection by various means can be used to guide and improve RL, both in terms of capability and in terms of reduced training times. I have a hard time finding good resources for this topic. What are some must-read papers on this topic? Are there any good youtube channels or online courses? I'm particularly interested in resources that feature practical implementations submitted by /u/worstthingsonline [link] [comments]  ( 9 min )
    Why am i getting this error?
    When I try to call check_env() from this code: from stable_baselines3.common.env_checker import check_env from agentStable import snakeEnv env = snakeEnv() check_env(env) I get this error: Traceback (most recent call last): File "myDir", line 6, in check_env(env) File "<myDir\Python\Python310\lib\site-packages\stable_baselines3\common\env_checker.py", line 396, in check_env assert isinstance( AssertionError: Your environment must inherit from the gymnasium.Env class cf. https://gymnasium.farama.org/api/env/ ​ Here is my agentStable.py code: import gym from gym import spaces import numpy as np from enum import Enum from collections import namedtuple import numpy as np from colorama import Fore from gameStable import SnakeGameAI, Direction, Point class snakeEnv(gym.Env): met…  ( 10 min )
    Autonomous Driving | Tight, dynamic and chaotic traffic | India | Swaayatt Robots
    submitted by /u/shani_786 [link] [comments]  ( 9 min )
    Zoomposium with Professor Dr. John-Dylan Haynes: "In search of the code of the brain"
    Zoomposium with Professor Dr. John-Dylan Haynes: "In search of the code of the brain" In this new episode of our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker from the "Blog der großen Fragen" and I have managed to win the well-known and renowned brain researcher and psychologist Professor Dr. John-Dylan Haynes for an interview. John-Dylan Haynes has been a professor of theory and analysis of long-range brain signals at the Bernstein Center for Computational Neuroscience and the Berlin Center for Advanced Neuroimaging (BCAN) at Charité and Humboldt University in Berlin since 2006. There, Professor Haynes and his team are "In Search of the Brain's Code". In order to crack this, larger amounts of data are collected from the functional magnetic resonanc…  ( 10 min )
    Considering in use between Model-free vs Model-based, and need suggestion in algorithms.
    In training MFRL, which is mostly simulation, why don't we use MBRL instead as the environment is accessible? **Correct me if I misunderstand in any. From my understanding, Model-Free RL (MFRL) is generally used for control tasks where environment is not accessible. It takes a sample of an experience from the environment and uses it to adjust its policy, either policy-based, value-based, and actor-critic. Model-Based RL (MBRL) uses a transition model to optimize the optimal policy like in model predictive control (MPC). I am interested in using RL for control multiple and continuous actions in continuous stochastic environment. For now, I am moving around DDPG. Do you have any suggested algorithm that match to my task? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
    Understanding how to get a dataset for more complex environments.
    In videos like this, it talks about how you need to find the right fit for your data which is shown on a scatter plot. I understand how this works when you have a dataset for something but how does it work when you are trying to train a DQN to play snake (eating apples and getting longer game). I have been struggling to tune my hyperparameters as well as figure out how many hidden neurons and hidden layers I need. I have found that right now 256 hidden neurons and 2 hidden layers works best. Please tell me if this topic has flown completely over my head and I am missing something. Thank you! submitted by /u/MrHank2 [link] [comments]  ( 9 min )
    FinRL and developing ML - skills and labour market
    If I wanted to hire an ML/RL student/full-time employee to help my firm develop some FinRL/other RL algorithms, what skills should I be looking for? How "generalized" are RL skills - from what I can tell alot of the RL I see posted here has to do with video games? I've stumbled across FinRL recently and would like to hire some help to develop some FinRL code. What's the market like for RL? I know all the rage is LLM's but how different is RL and does the labour market care about the difference? Based in Canada fyi. Won't be hiring for a few months. submitted by /u/Thrumpwart [link] [comments]  ( 9 min )
  • Open

    DFT conventions: NumPy vs Mathematica
    Just as there are multiple conventions for defining the Fourier transform, there are multiple conventions for defining the discrete Fourier transform (DFT), better known as the fast Fourier transform (FFT). [1] This post will look at two DFT conventions, one used in Python’s NumPy library, and one used in Mathematica. There are more conventions in […] DFT conventions: NumPy vs Mathematica first appeared on John D. Cook.  ( 6 min )

  • Open

    [D] Where did the research go?
    This sub used to be my go-to place for finding out cool new ML research but sadly it has now become a "generative AI" "AI productisation" circlejerk. I was wondering where people now go to discover new ML research (besides ArXiv of course!) submitted by /u/blabboy [link] [comments]  ( 9 min )
    [P] What are some good MLE project ideas ?
    What tech stack, what frameworks should I specially learn and use in my MLE project ? There are so much things mentioned in job offers, what would you suggest me to focus on ? I thought of fine-tuning LLM and deploying it using AWS. I'd prefer this project to be NLP oriented. I read about things like MLFlow, Apache Spar, Kubernetes etc. and don't know what to focus on. PS: I am currently a data scientist, and have recently finished a body pose estimation + action recognition web app, using Python/OpenCV/Mediapipe/Flask/Torch submitted by /u/tflbbl [link] [comments]  ( 9 min )
    [P][D] How do I improve car detection performance?
    For a college project I am trying to detect the number of cars in the first 3 rows of a parking lot. Here is my roboflow project page: https://universe.roboflow.com/boaztheostrich/cartest-tyaur As you can see I have been able to get my map score as high as .995 however I am still having difficulty consistently detecting cars in some edge cases. What I am currently testing is increasing the resolution from 1280x720 to 2048. I am new to all of this so any tips or tricks would be greatly appreciated. I am currently using google colab for training although I am considering switching over to vast.ai submitted by /u/johndowlelxdxdxdxdxd [link] [comments]  ( 9 min )
    Need collaborators for a natural language interface [P]
    We have it pretty functional but we're a small team so we need more people. https://github.com/apssouza22/chatflow Promo video: https://www.reddit.com/r/AGIunderconstruction/comments/168fsyr/come_build_open_source_natural_language/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/Cold-Explanation-984 [link] [comments]  ( 9 min )
    [D] How to create and use multiple dataframes in pyspark?
    Hey All, I have to read in multiple JSON files with each one containing objects in an array. For each file I want to create a data frame (A matrix might also work) where the rows and columns are just integers pointing to a string. Like this ​ - 0 1 2 0 dsad asd ad 1 asd asd grth 2 ter xc wer ​ Using the same JSON file I also want to process the objects inside the arrays using the dataframe (matrix) above. So the process (pipeline) would be something like: |==> Create Matrix =======>| JSON file =>| | ===> Use matrix to process object. |==> Individual objects ==>| ​ I have been looking through the docs but still unsure how to do this. Should I use a dataframe or a spark matrix? How do I split the objects into parts and also generate the matrix? How do I combine dataframes which isn't joining? Just a point in the right direction would be great. Thanks in advance for this relatively simple question. submitted by /u/atticusfinch975 [link] [comments]  ( 9 min )
    [D] can somone help me get this paper
    I don't have access can somone help me get it please Thank you https://www.worldscientific.com/doi/abs/10.1142/S0218001418560062 submitted by /u/SilenceOfTheUnicorns [link] [comments]  ( 9 min )
    [D]Tips (Algorithms)
    I have started learning ML a month ago... Did a foundational Google course and read from some other sources..hav learnt most of the theories....What's the best place to learn algorithms according to you? Any other tips are also welcome submitted by /u/Buri-Buri_zaemon [link] [comments]  ( 9 min )
    [D] RX 7900 XTX vs RTX 4080
    I know AMD is working on making ROCM support for RDNA 3, would that rival nvidia? and would there be enough support for it to be usable? Nvidia cards are way more expensive and i would like to use it for gaming besides Machine learning for my study. Also, would this be overkill? will an RTX 4070 or an RX 7900 XT also do the job just fine? i am new to ML and won't be using it till early 2024, thank you all for reading. submitted by /u/RepresentativeIll155 [link] [comments]  ( 9 min )
    [D] How to describe XGBoost, Boosting and Bagging?
    Hi Can someone here please help me with this algorithm? What is the “Boosting” part of the algorithm? To my (limited) understanding XGBoost is an ensemble learning algorithm that uses many decision trees (efficiently), where each tree tries to correct the loss of the previous one. But I’m not sure how this is connected to “Boosting” and then it’s cousin “Bagging” Any intuition that may help me here? submitted by /u/Ok_Reality2341 [link] [comments]  ( 9 min )
    [D] A case for summaries over abstracts
    I usually peruse the abstract of a paper before deciding on whether to read it or not. However, lately I've started longing for more personalized summaries. I wonder what others think of abstract vs summaries and their preferences of the latter over former ? In your opinion, how far has the field progressed in summarization (https://paperswithcode.com/dataset/scitldr) ? submitted by /u/JurrasicBarf [link] [comments]  ( 9 min )
    [D] NeurIPS reviewers edited review and score after discussion period: can they delete their own revision history?
    Hi, we have a paper submission to NeurIPS and we have two reviewers who changed their scores and review content silently by editing the original review comment and score after the discussion period. The edited review comment now discusses entirely different point. We would like to raise this concern to AC but the thing is that we didn’t save the original review comment, and the “revision history” for some reason doesn’t show the previous content, other than the entry that there was previous version. But this revision history overall isn’t inconsistent (showing the last two history after the discussion period, but the ones before the period is not shown) Can reviewers delete their own revision history in OpenReview tool? I don’t know if this is a bug or they deleted them with an intention. submitted by /u/mayasang [link] [comments]  ( 9 min )
    [D] Testing at 80-95% - Newly collected recent data 55% - WHY!?
    I am currently do some predictions on some market data, I'm using both XGBoost and lightGBM (not both at the same time just experimenting using both algorithms). I have around 2500 features and 40k rows of data in my dataset which is being split 75% = train, 12.5% = valid, 12.5% = Test. The balance of the data is massively imbalanced with a binary classification. On training im seeing 0 = 20192 and 1 = 8337. I am not using SMOTE or undersampling but rather using the alogirhms own parameters to combat the Imbalance, for example scale_pos_weight: y_data[0]/y_data[1]. Training is going very well, im using hyperopt tuner to tune my paramets and usually on average get 75% accuracy on testing, training will usually be a little higher such as 77% and valid will be fairly close to test. But the …  ( 10 min )
    [P] Build a Recommender System that Includes Term / Vector Recall, DeepFM Ranking, Inference Engine and Web Application.
    Hello everyone! I've noticed that most beginner-level tutorials on recommender systems primarily focus on model training, with limited information about deploying them in a production environment. Additionally, the different usage of models in the recall (retrieval) and ranking modules can indeed be confusing for beginners. Recently, I've been working on a recommender system project that encompasses both offline development and online deployment, covering both recall and ranking modules. The entire project is developed using Python and executed on a single laptop. All components are contained within Docker, ensuring no impact on the local environment. The GitHub repo: https://github.com/akiragy/recsys_pipeline You can follow the commands provided in the README to run it. This project primarily utilizes PyTorch, Redis, Elasticsearch, Feast Feature Store, Triton Inference Server, and Flask. PyTorch is used for training the FM model for recall and the DeepFM model for ranking. Redis serves as the store for user terms and vectors, while Elasticsearch is used to create an item term index and a vector index. Redis and Elasticsearch form the recall module. Feast is utilized to store user and item features, while Triton serves as a real-time prediction engine. Feast and Triton form the ranking module. Flask is deployed as the web server, receiving recommendation requests and returning responses. Thanks for checking it out! submitted by /u/Johann_SebastianBach [link] [comments]  ( 9 min )
    [Research] Benchmarking Neural Network Generalization for Grammar Induction
    Benchmark: 🧘 BLISS – a Benchmark for Language Induction from Small Sets https://github.com/taucompling/bliss/ Paper: https://arxiv.org/abs/2308.08253 submitted by /u/nurikolan [link] [comments]  ( 9 min )
    [R] Improving model results with EDCR
    We released another preprint on a neuro-symbolic approach called "metacognitive error correction and detection rules" (EDCR). The idea is that if you have a trained neural model, you can symbolically fine tune the results with rules. In this initial study, we apply it to the classification of GPS movement traces. Video: https://www.youtube.com/watch?v=d_OV4lap_rk Preprint: https://arxiv.org/abs/2308.14250 Code: https://github.com/lab-v2/Error-Detection-and-Correction Further information: https://neurosymbolic.asu.edu/metacognition/ In the example below, we show the results for a single class. The rules detect errors by identifying classifications that may be incorrect and then re-assign to a new class. While recall can drop for a given class, we can bound the drop in recall with a hyperparameter - but this is guaranteed to improve precision. This is illustrated in the below figure. We show this approach leads to an overall improvement in accuracy over the base model, including the state-of-the-art. We also examine the effects when encountering classes not seen in the model's training data. We provide theoretical as well as empirical results and believe this approach can be used in other use-cases in the future. ​ https://preview.redd.it/3z0cdp80dulb1.png?width=635&format=png&auto=webp&s=2eff6ce0f2c7b6983dbfbc030f0f7993010a30fb submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    [D] 10 hard-earned lessons from shipping generative AI products over the past 18 months
    Hey all, I'm the founder of a generative AI consultancy and we build gen AI powered products for other companies. We've been doing this for 18 months now and I thought I share our learnings - it might help others. ​ It's a never ending battle to keep up with the latest tools and developments. By the time you ship your product it's already using an outdated tech-stack. There are no best-practices yet. You need to make a bet on tools/processes and hope that things won't change much by the time you ship (they will, see point 2). If your generative AI product doesn't have a VC-backed competitor, there will be one soon. In order to win you need one of the two things: either (1) the best distribution or (2) the generative AI component is hidden in your product so others don't/can't copy you. AI researchers / data scientists are suboptimal choice for AI engineering. They're expensive, won't be able to solve most of your problems and likely want to focus on more fundamental problems rather than building products. Software engineers make the best AI engineers. They are able to solve 80% of your problems right away and they are motivated because they can "work in AI". Product designers need to get more technical, AI engineers need to get more product-oriented. The gap currently is too big and this leads to all sorts of problems during product development. Demo bias is real and it makes it 10x harder to deliver something that's in alignment with your client's expectation. Communicating this effectively is a real and underrated skill. There's no such thing as off-the-shelf AI generated content yet. Current tools are not reliable enough, they hallucinate, make up stuff and produce inconsistent results (applies to text, voice, image and video). submitted by /u/BootstrapGuy [link] [comments]  ( 10 min )
    [D] What is the best text-to-speech tool (preferably free) currently?
    Hi everyone, I need a TTS tool that sounds exactly like a human voice. I want to use it to edit some of my YouTube videos. I see a lot of TTS platforms around. Which do you recommend? I hope this isn't too much to ask. I would gladly appreciate it. Thanks in advance. submitted by /u/cessilh1 [link] [comments]  ( 9 min )
    [R] Real-time Road Segmentation without Dense Depth Images
    You can use the following code if you want to detect the road in real-time in your vehicle/robot : https://github.com/ErkanMilli/3MT-RoadSeg . One of the main problems in road segmentation by using depth was that if a region is flat, such as walls, it may be detected as road. This was already a known phenomenon and to overcome this, surface normal estimation was used. But, SNE requires dense depth images. Instead, we used a multi-task architecture, and used surface normals as an auxiliary loss, which reduced computation time significantly and also we don't need a dense depth image. Only LiDAR (which is sparse in nature) is sufficient. submitted by /u/ozgurerkent [link] [comments]  ( 9 min )
    [P] Threshold of acceptability in a fill-mask task with BERT
    Hi everyone, I am very new to machine learning and statistics, and am currently building an experiment that includes probing the knowledge of a bert-base-uncased model in a fill-mask test, without fine-tuning - just the regular pretrained model. I want to see the models knowledge of certain grammatical notions in English - whether its judgements are similar to those of humans or not :) My point is to give the model inputs like: "what do you call a room filled with socks? you called it a [MASK] filled room", or "a monster who eats rats is called a [MASK] eater", and check the probabilities it gives to the corresponding singular and plural token, e.g. in the first case I want to probe "sock" \ "socks", and in the second case "rat" / "rats". I built a script which does exactly this - pulls…  ( 10 min )
    [R][P] One class is hard to detect in vision project
    Hi, I’ve been working for a while now on a project to detect points in medical images which are to be classified into 3 different classes, but my UNet really struggles to predict one of the 3 classes (>70% score when excluding this class vs ~30% when not). I have tried putting a separate decoder just for this one class but the results are worse, and I don’t really have other ideas to better my results. Do you have any ideas/techniques to help me improve my results? Thanks ! submitted by /u/maths_and_baguette [link] [comments]  ( 9 min )
    [D] Stanford's ML for Graphs course
    Hi everyone. Has anybody taken this course from Stanford https://online.stanford.edu/courses/xcs224w-machine-learning-graphs or any other course in the same online portal? Was it worth it? I am considering to apply. Thanks submitted by /u/Realistic-Bed2658 [link] [comments]  ( 9 min )
    [R] Recurrent Forward Forward: Accuracy Issues
    Problem I recently did a bit of a career switch from big tech IoT Rust job, into a machine learning research role. For the last few months, I have been working on building out the Recurrent Forward Forward model from Hinton's Forward Forward paper (Fig3): https://arxiv.org/abs/2212.13345 I have an implementation, but have been stuck for the past 4-6 weeks on trying to improve the accuracy. My implementation is only getting 95% test accuracy on MNIST. Hinton and Alex Ororbia (author of this) have been able to achieve high test accuracy (99%+) using this architecture, so I know it is possible. What I have tried I have tried many different things at this point: Different activation functions. Weight initialization. Regularization techniques like transforms, jitter, and dynamic nega…  ( 11 min )
    [D][R] Why do we need the convolution in upsample and downsample blocks?
    Hi fellow computer scientists and engineers, ​ I've been wondering why do we often have a convolution inside every upsample and downsample block. Well, it makes sense, if you intend to upscale some features and use a bilinear interpolation, then some error can be introduced due to interpolation inaccuracies. This is where convolution layer comes handy to help and support the upscaling. But is this really the reason behind it? Or is there a deeper explanation? ​ Also, just for the sake of curiosity. What if the scale_factor of an upsample block was 1. Should we still keep the convolution layer? or just get rid of all the upsample block since there is no actual "upsampling" being done at least in the context of the tensor dimensions. ​ Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
  • Open

    Markov Property
    Is that wrong if a problem doesn't satisfy the Markov property, I cannot solve it with the RL approach either? submitted by /u/nimageran [link] [comments]  ( 9 min )
    Negative KL-divergence RLHF implementation
    I am struggling to understand one part of the FAQ of the transformer reinforcement learning library from HuggingFace: What Is the Concern with Negative KL Divergence? If you generate text by purely sampling from the model distribution things work fine in general. But when you use the generate method there are a few caveats because it does not always purely sample depending on the settings which can cause KL-divergence to go negative. Essentially when the active model achieves log_p_token_active < log_p_token_ref we get negative KL-div. This can happen in a several cases: top-k sampling: the model can smooth out the probability distribution causing the top-k tokens having a smaller probability than those of the reference model but they still are selected min_length: this ignores the EOS token until min_length is reached. thus the model can assign a very high log prob to the EOS token and very low prob to all others until min_length is reached batched generation: finished sequences in a batch are padded until all generations are finished. The model can learn to assign very low probabilities to the padding tokens unless they are properly masked or removed.These are just a few examples. Why is negative KL an issue? The total reward R is computed R = r - beta * KL so if the model can learn how to drive KL-divergence negative it effectively gets a positive reward. In many cases it can be much easier to exploit such a bug in the generation than actually learning the reward function. In addition the KL can become arbitrarily small thus the actual reward can be very small compared to it. I understand why the KL-divergence that is computed here is an approximation that can be negative as opposed to the real one. However, I cannot wrap my head around the details of why these specific sampling parameters would lead to negative KL-divergence. Could someone elaborate on these points? submitted by /u/Loud_Appointment_418 [link] [comments]  ( 10 min )
    Working on a project which involves creating an agent to work on chess environment
    I am using DQN Algorithm and A2C algorithm (Not using any lookaheads to see potential moves and only using self-learning coz my teacher asked me not to look into future combinations and let it play and understand itself) separately to check the performance of the agent and the neural network gives probabilities of the moves in the size of 4096 (64*64) . But the probabilities are decreasing with each and every move performed and they are overfitting to one move which is an invalid move (same case for both dqn and a2c) so in the bellman equation i removed the next reward prediction and put constant value of 1 to check whether it is at least trying to increase the probability for valid moves but that doesnt seems to be the case because it is still giving probability of 1 for an invalid move. and there is also this case where the probabilities are getting so small they are becoming nan values. can someone provide some insights for me to look into submitted by /u/S_U_B_B_U [link] [comments]  ( 9 min )
    UCL Reinforcement learning lectures
    I see lectures on youtube from UCL+DeppMind on RL spanning from 2015 through 2021. Which one would you say is the best to follow? I've heard many good things about David Silver's lectures, but how do the most recent, 2021, lectures compare? submitted by /u/Practical_Ad_8782 [link] [comments]  ( 9 min )
  • Open

    DFT mandalas
    Math books often use some illustration from the book contents as cover art. When they do, there’s often some mystery to the cover art, and a sense of accomplishment when you get far enough into the book to understand the significance of the cover. (See examples here.) William L. Briggs and Van Emden Henson wrote […] DFT mandalas first appeared on John D. Cook.  ( 5 min )
  • Open

    Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Metacognition with EDCR
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Elon Musk's X to leverage public data for AI model training
    Elon Musk's X revealed its plans to utilize user data and publicly available information in training AI models. Despite Musk's assurance that only public data will be used, concerns around privacy linger. For expert insights into AI developments, look here first. https://preview.redd.it/mj64uof9rvlb1.png?width=2000&format=png&auto=webp&s=a856ce3e4b6063ebf7a585df3338142defba6323 X's approach to AI training Under the most recent privacy policy, X will harness the personal data it collects and publicly accessible information for its machine learning algorithms. Musk assures only publicly accessible data will be used, safeguarding private user information like DMs. However, with X having disbanded its press operation, more specific details about the data collected and its intended use still need to be provided. Unfolding plans of Musk Despite X's quiet stance on AI, Musk recently launched xAI, aspiring "to understand the true nature of the universe." xAI's homepage discloses plans to sync with X closely, possibly using collected user data to progress the mission. A competitive stance against LinkedIn suggests a possible additional motive for data collection, speculating an enhanced job and education section on X. Despite concerns about selling user data for revenue, concrete evidence is needed to support this argument, reflecting Twitter's previous strategy. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    The mystery of AI learning is solved by Stanford researchers
    Say goodbye to the black box of deep learning and hello to a new era of transparent, efficient, and ethical AI. Find out how this changes EVERYTHING! https://kinews24.de/stanford-cracks-the-ai-code-the-groundbreaking-law-of-equi-separation ​ submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    Ideas for a high school AI/ML club
    I'm thinking of creating an AI club at my high school. The problem is, unlike something like math or coding, there aren't many competitions suitable to beginners and not a lot of previous template content to follow. Therefore, I need to forge my own path. I am curious what your ideas are for some engaging, high-school-friendly topics and events to have, especially if we can only meet for 30 minutes a week. Thanks in advance! submitted by /u/0xCUBE [link] [comments]  ( 9 min )
    An AI to help with my psychology assignment?
    My psychology masters assignments are to be handwritten and hence a somewhat painstaking process. To streamline I was looking for an AI that can guide me on the concepts and understanding of the given psychology subjects. I don't want to use it as a shortcut just a tool for studying and guiding. In accordance with books and Google. Can anyone know of such an AI? submitted by /u/Maddragon0088 [link] [comments]  ( 9 min )
    follow me on X for ai news without the garbage. just made an account bc im tired of these annoying accounts and decided to just make my own ai news account
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Could AI be the game-changer in tackling the opioid epidemic?
    The stubborn and complex opioid epidemic may finally meet its match—AI. As the crisis continues taking a fearsome toll, experts are turning to advanced technology in their ongoing battle. If you want to stay on top of the latest trends and insights in AI, look here first. https://preview.redd.it/vm23xflorqlb1.jpg?width=1390&format=pjpg&auto=webp&s=212b88fb01eb0f7afaa5011120267ac4ce37ee35 AI’s evolving role in tackling the opioid crisis With a legacy of over 1 million overdose deaths since 1999, the opioid crisis has stubbornly resisted traditional preventive and regulatory measures. The latest AI-fueled developments offer newfound hope. Groundbreaking AI innovations are focusing on identifying individuals at potential risk, monitoring treatment progress, and predicting relapse probabilities. Decoding social media behavior offers an effective outlet for early intervention. More radically, AI-enabled wearable devices are being developed to detect overdose symptoms and automatically deliver lifesaving treatment. AI: A double-edged sword? Despite its promising potential, AI application in this sphere also raises concerns around privacy rights and misinformation. Facial recognition technology could lead to discrimination, while the risk of false data being fed into chatbots causing harm cannot be undermined. Trust in AI and its appropriate deployment will be crucial to ensuring its positive contribution rather than being a dystopian threat. P.S. If you like this kind of analysis, you’ll love my free newsletter that tracks the most relevant news and research in AI and tech. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Knowledge Graph Embeddings in the Biomedical Domain: Are They Useful? A Look at Link Prediction, Rule Learning, and Downstream Polypharmacy Tasks. (arXiv:2305.19979v2 [cs.LG] UPDATED)
    Knowledge graphs are powerful tools for representing and organising complex biomedical data. Several knowledge graph embedding algorithms have been proposed to learn from and complete knowledge graphs. However, a recent study demonstrates the limited efficacy of these embedding algorithms when applied to biomedical knowledge graphs, raising the question of whether knowledge graph embeddings have limitations in biomedical settings. This study aims to apply state-of-the-art knowledge graph embedding models in the context of a recent biomedical knowledge graph, BioKG, and evaluate their performance and potential downstream uses. We achieve a three-fold improvement in terms of performance based on the HITS@10 score over previous work on the same biomedical knowledge graph. Additionally, we provide interpretable predictions through a rule-based method. We demonstrate that knowledge graph embedding models are applicable in practice by evaluating the best-performing model on four tasks that represent real-life polypharmacy situations. Results suggest that knowledge learnt from large biomedical knowledge graphs can be transferred to such downstream use cases. Our code is available at https://github.com/aryopg/biokge.  ( 3 min )
    CongNaMul: A Dataset for Advanced Image Processing of Soybean Sprouts. (arXiv:2308.15690v2 [cs.CV] UPDATED)
    We present 'CongNaMul', a comprehensive dataset designed for various tasks in soybean sprouts image analysis. The CongNaMul dataset is curated to facilitate tasks such as image classification, semantic segmentation, decomposition, and measurement of length and weight. The classification task provides four classes to determine the quality of soybean sprouts: normal, broken, spotted, and broken and spotted, for the development of AI-aided automatic quality inspection technology. For semantic segmentation, images with varying complexity, from single sprout images to images with multiple sprouts, along with human-labelled mask images, are included. The label has 4 different classes: background, head, body, tail. The dataset also provides images and masks for the image decomposition task, including two separate sprout images and their combined form. Lastly, 5 physical features of sprouts (head length, body length, body thickness, tail length, weight) are provided for image-based measurement tasks. This dataset is expected to be a valuable resource for a wide range of research and applications in the advanced analysis of images of soybean sprouts. Also, we hope that this dataset can assist researchers studying classification, semantic segmentation, decomposition, and physical feature measurement in other industrial fields, in evaluating their models. The dataset is available at the authors' repository. (https://bhban.kr/data)  ( 2 min )
    Online Distributed Learning with Quantized Finite-Time Coordination. (arXiv:2307.06620v2 [cs.LG] UPDATED)
    In this paper we consider online distributed learning problems. Online distributed learning refers to the process of training learning models on distributed data sources. In our setting a set of agents need to cooperatively train a learning model from streaming data. Differently from federated learning, the proposed approach does not rely on a central server but only on peer-to-peer communications among the agents. This approach is often used in scenarios where data cannot be moved to a centralized location due to privacy, security, or cost reasons. In order to overcome the absence of a central server, we propose a distributed algorithm that relies on a quantized, finite-time coordination protocol to aggregate the locally trained models. Furthermore, our algorithm allows for the use of stochastic gradients during local training. Stochastic gradients are computed using a randomly sampled subset of the local training data, which makes the proposed algorithm more efficient and scalable than traditional gradient descent. In our paper, we analyze the performance of the proposed algorithm in terms of the mean distance from the online solution. Finally, we present numerical results for a logistic regression task.  ( 2 min )
    Neural ShDF: Reviving an Efficient and Consistent Mesh Segmentation Method. (arXiv:2306.11737v2 [cs.GR] UPDATED)
    Partitioning a polygonal mesh into meaningful parts can be challenging. Many applications require decomposing such structures for further processing in computer graphics. In the last decade, several methods were proposed to tackle this problem, at the cost of intensive computational times. Recently, machine learning has proven to be effective for the segmentation task on 3D structures. Nevertheless, these state-of-the-art methods are often hardly generalizable and require dividing the learned model into several specific classes of objects to avoid overfitting. We present a data-driven approach leveraging deep learning to encode a mapping function prior to mesh segmentation for multiple applications. Our network reproduces a neighborhood map using our knowledge of the \textsl{Shape Diameter Function} (SDF) method using similarities among vertex neighborhoods. Our approach is resolution-agnostic as we downsample the input meshes and query the full-resolution structure solely for neighborhood contributions. Using our predicted SDF values, we can inject the resulting structure into a graph-cut algorithm to generate an efficient and robust mesh segmentation while considerably reducing the required computation times.  ( 2 min )
    Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning. (arXiv:2307.04726v2 [cs.LG] UPDATED)
    Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for data collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to represent multimodal behavior in the dataset. Nevertheless, these methods are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method, named State Reconstruction for Diffusion Policies (SRDP), incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution (OOD) states. We design a novel 2D Multimodal Contextual Bandit environment to illustrate the OOD generalization of SRDP compared to prior algorithms. In addition, we assess the performance of our model on D4RL continuous control benchmarks, namely the navigation of an 8-DoF ant and forward locomotion of half-cheetah, hopper, and walker2d, achieving state-of-the-art results.  ( 2 min )
    Improving the Validity of Decision Trees as Explanations. (arXiv:2306.06777v3 [cs.LG] UPDATED)
    In classification and forecasting with tabular data, one often utilizes tree-based models. Those can be competitive with deep neural networks on tabular data [cf. Grinsztajn et al., NeurIPS 2022, arXiv:2207.08815] and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. Decision trees containing leaves with unbalanced accuracy can provide misleading explanations. Low-accuracy leaves give less valid explanations, which could be interpreted as unfairness among explanations. Here, we train a shallow tree with the objective of minimizing the maximum misclassification error across each leaf node. Then, we extend each leaf with a separate tree-based model. The shallow tree provides a global explanation, while the overall statistical performance of the shallow tree with extended leaves improves upon decision trees of unlimited depth trained using classical methods (e.g., CART) and is comparable to state-of-the-art methods (e.g., well-tuned XGBoost).  ( 2 min )
    Dynamic Data Augmentation via MCTS for Prostate MRI Segmentation. (arXiv:2305.15777v2 [eess.IV] UPDATED)
    Medical image data are often limited due to the expensive acquisition and annotation process. Hence, training a deep-learning model with only raw data can easily lead to overfitting. One solution to this problem is to augment the raw data with various transformations, improving the model's ability to generalize to new data. However, manually configuring a generic augmentation combination and parameters for different datasets is non-trivial due to inconsistent acquisition approaches and data distributions. Therefore, automatic data augmentation is proposed to learn favorable augmentation strategies for different datasets while incurring large GPU overhead. To this end, we present a novel method, called Dynamic Data Augmentation (DDAug), which is efficient and has negligible computation cost. Our DDAug develops a hierarchical tree structure to represent various augmentations and utilizes an efficient Monte-Carlo tree searching algorithm to update, prune, and sample the tree. As a result, the augmentation pipeline can be optimized for each dataset automatically. Experiments on multiple Prostate MRI datasets show that our method outperforms the current state-of-the-art data augmentation strategies.
    Biclustering Methods via Sparse Penalty. (arXiv:2308.14388v2 [stat.ML] UPDATED)
    In this paper, we first reviewed several biclustering methods that are used to identify the most significant clusters in gene expression data. Here we mainly focused on the SSVD(sparse SVD) method and tried a new sparse penalty named "Prenet penalty" which has been used only in factor analysis to gain sparsity. Then in the simulation study, we tried different types of generated datasets (with different sparsity and dimension) and tried 1-layer approximation then for k-layers which shows the mixed Prenet penalty is very effective for non-overlapped data. Finally, we used some real gene expression data to show the behavior of our methods.
    Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings. (arXiv:2306.17670v2 [cs.NE] UPDATED)
    Spiking Neural Networks (SNNs) are a promising research direction for building power-efficient information processing systems, especially for temporal tasks such as speech recognition. In SNNs, delays refer to the time needed for one spike to travel from one neuron to another. These delays matter because they influence the spike arrival times, and it is well-known that spiking neurons respond more strongly to coincident input spikes. More formally, it has been shown theoretically that plastic delays greatly increase the expressivity in SNNs. Yet, efficient algorithms to learn these delays have been lacking. Here, we propose a new discrete-time algorithm that addresses this issue in deep feedforward SNNs using backpropagation, in an offline manner. To simulate delays between consecutive layers, we use 1D convolutions across time. The kernels contain only a few non-zero weights - one per synapse - whose positions correspond to the delays. These positions are learned together with the weights using the recently proposed Dilated Convolution with Learnable Spacings (DCLS). We evaluated our method on three datasets: the Spiking Heidelberg Dataset (SHD), the Spiking Speech Commands (SSC) and its non-spiking version Google Speech Commands v0.02 (GSC) benchmarks, which require detecting temporal patterns. We used feedforward SNNs with two or three hidden fully connected layers, and vanilla leaky integrate-and fire neurons. We showed that fixed random delays help and that learning them helps even more. Furthermore, our method outperformed the state-of-the-art in the three datasets without using recurrent connections and with substantially fewer parameters. Our work demonstrates the potential of delay learning in developing accurate and precise models for temporal data processing. Our code is based on PyTorch / SpikingJelly and available at: https://github.com/Thvnvtos/SNN-delays
    Yet Another ICU Benchmark: A Flexible Multi-Center Framework for Clinical ML. (arXiv:2306.05109v2 [cs.LG] UPDATED)
    Medical applications of machine learning (ML) have experienced a surge in popularity in recent years. The intensive care unit (ICU) is a natural habitat for ML given the abundance of available data from electronic health records. Models have been proposed to address numerous ICU prediction tasks like the early detection of complications. While authors frequently report state-of-the-art performance, it is challenging to verify claims of superiority. Datasets and code are not always published, and cohort definitions, preprocessing pipelines, and training setups are difficult to reproduce. This work introduces Yet Another ICU Benchmark (YAIB), a modular framework that allows researchers to define reproducible and comparable clinical ML experiments; we offer an end-to-end solution from cohort definition to model evaluation. The framework natively supports most open-access ICU datasets (MIMIC III/IV, eICU, HiRID, AUMCdb) and is easily adaptable to future ICU datasets. Combined with a transparent preprocessing pipeline and extensible training code for multiple ML and deep learning models, YAIB enables unified model development. Our benchmark comes with five predefined established prediction tasks (mortality, acute kidney injury, sepsis, kidney function, and length of stay) developed in collaboration with clinicians. Adding further tasks is straightforward by design. Using YAIB, we demonstrate that the choice of dataset, cohort definition, and preprocessing have a major impact on the prediction performance - often more so than model class - indicating an urgent need for YAIB as a holistic benchmarking tool. We provide our work to the clinical ML community to accelerate method development and enable real-world clinical implementations. Software Repository: https://github.com/rvandewater/YAIB.
    MedShapeNet -- A Large-Scale Dataset of 3D Medical Shapes for Computer Vision. (arXiv:2308.16139v2 [cs.CV] UPDATED)
    We present MedShapeNet, a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D surgical instrument models. Prior to the deep learning era, the broad application of statistical shape models (SSMs) in medical image analysis is evidence that shapes have been commonly used to describe medical data. Nowadays, however, state-of-the-art (SOTA) deep learning algorithms in medical imaging are predominantly voxel-based. In computer vision, on the contrary, shapes (including, voxel occupancy grids, meshes, point clouds and implicit surface models) are preferred data representations in 3D, as seen from the numerous shape-related publications in premier vision conferences, such as the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), as well as the increasing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models) in computer vision research. MedShapeNet is created as an alternative to these commonly used shape benchmarks to facilitate the translation of data-driven vision algorithms to medical applications, and it extends the opportunities to adapt SOTA vision algorithms to solve critical medical problems. Besides, the majority of the medical shapes in MedShapeNet are modeled directly on the imaging data of real patients, and therefore it complements well existing shape benchmarks comprising of computer-aided design (CAD) models. MedShapeNet currently includes more than 100,000 medical shapes, and provides annotations in the form of paired data. It is therefore also a freely available repository of 3D models for extended reality (virtual reality - VR, augmented reality - AR, mixed reality - MR) and medical 3D printing. This white paper describes in detail the motivations behind MedShapeNet, the shape acquisition procedures, the use cases, as well as the usage of the online shape search portal: https://medshapenet.ikim.nrw/  ( 4 min )
    Why Does Little Robustness Help? Understanding and Improving Adversarial Transferability from Surrogate Training. (arXiv:2307.07873v5 [cs.LG] UPDATED)
    Adversarial examples (AEs) for DNNs have been shown to be transferable: AEs that successfully fool white-box surrogate models can also deceive other black-box models with different architectures. Although a bunch of empirical studies have provided guidance on generating highly transferable AEs, many of these findings lack explanations and even lead to inconsistent advice. In this paper, we take a further step towards understanding adversarial transferability, with a particular focus on surrogate aspects. Starting from the intriguing little robustness phenomenon, where models adversarially trained with mildly perturbed adversarial samples can serve as better surrogates, we attribute it to a trade-off between two predominant factors: model smoothness and gradient similarity. Our investigations focus on their joint effects, rather than their separate correlations with transferability. Through a series of theoretical and empirical analyses, we conjecture that the data distribution shift in adversarial training explains the degradation of gradient similarity. Building on these insights, we explore the impacts of data augmentation and gradient regularization on transferability and identify that the trade-off generally exists in the various training mechanisms, thus building a comprehensive blueprint for the regulation mechanism behind transferability. Finally, we provide a general route for constructing better surrogates to boost transferability which optimizes both model smoothness and gradient similarity simultaneously, e.g., the combination of input gradient regularization and sharpness-aware minimization (SAM), validated by extensive experiments. In summary, we call for attention to the united impacts of these two factors for launching effective transfer attacks, rather than optimizing one while ignoring the other, and emphasize the crucial role of manipulating surrogate models.  ( 3 min )
    MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation. (arXiv:2305.08396v4 [eess.IV] UPDATED)
    In this work, we present MaxViT-UNet, an Encoder-Decoder based hybrid vision transformer (CNN-Transformer) for medical image segmentation. The proposed Hybrid Decoder, based on MaxViT-block, is designed to harness the power of both the convolution and self-attention mechanisms at each decoding stage with a nominal memory and computational burden. The inclusion of multi-axis self-attention, within each decoder stage, significantly enhances the discriminating capacity between the object and background regions, thereby helping in improving the segmentation efficiency. In the Hybrid Decoder block, the fusion process commences by integrating the upsampled lower-level decoder features, obtained through transpose convolution, with the skip-connection features derived from the hybrid encoder. Subsequently, the fused features undergo refinement through the utilization of a multi-axis attention mechanism. The proposed decoder block is repeated multiple times to progressively segment the nuclei regions. Experimental results on MoNuSeg18 and MoNuSAC20 dataset demonstrates the effectiveness of the proposed technique. Our MaxViT-UNet outperformed the previous CNN-based (UNet) and Transformer-based (Swin-UNet) techniques by a considerable margin on both of the standard datasets. The following github (https://github.com/PRLAB21/MaxViT-UNet) contains the implementation and trained weights.  ( 2 min )
    Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models. (arXiv:2305.10474v2 [cs.CV] UPDATED)
    Despite tremendous progress in generating high-quality images using diffusion models, synthesizing a sequence of animated frames that are both photorealistic and temporally coherent is still in its infancy. While off-the-shelf billion-scale datasets for image generation are available, collecting similar video data of the same scale is still challenging. Also, training a video diffusion model is computationally much more expensive than its image counterpart. In this work, we explore finetuning a pretrained image diffusion model with video data as a practical solution for the video synthesis task. We find that naively extending the image noise prior to video noise prior in video diffusion leads to sub-optimal performance. Our carefully designed video noise prior leads to substantially better performance. Extensive experimental validation shows that our model, Preserve Your Own Correlation (PYoCo), attains SOTA zero-shot text-to-video results on the UCF-101 and MSR-VTT benchmarks. It also achieves SOTA video generation quality on the small-scale UCF-101 benchmark with a $10\times$ smaller model using significantly less computation than the prior art.  ( 2 min )
    Neural Mixed Effects for Nonlinear Personalized Predictions. (arXiv:2306.08149v3 [cs.LG] UPDATED)
    Personalized prediction is a machine learning approach that predicts a person's future observations based on their past labeled observations and is typically used for sequential tasks, e.g., to predict daily mood ratings. When making personalized predictions, a model can combine two types of trends: (a) trends shared across people, i.e., person-generic trends, such as being happier on weekends, and (b) unique trends for each person, i.e., person-specific trends, such as a stressful weekly meeting. Mixed effect models are popular statistical models to study both trends by combining person-generic and person-specific parameters. Though linear mixed effect models are gaining popularity in machine learning by integrating them with neural networks, these integrations are currently limited to linear person-specific parameters: ruling out nonlinear person-specific trends. In this paper, we propose Neural Mixed Effect (NME) models to optimize nonlinear person-specific parameters anywhere in a neural network in a scalable manner. NME combines the efficiency of neural network optimization with nonlinear mixed effects modeling. Empirically, we observe that NME improves performance across six unimodal and multimodal datasets, including a smartphone dataset to predict daily mood and a mother-adolescent dataset to predict affective state sequences where half the mothers experience at least moderate symptoms of depression. Furthermore, we evaluate NME for two model architectures, including for neural conditional random fields (CRF) to predict affective state sequences where the CRF learns nonlinear person-specific temporal transitions between affective states. Analysis of these person-specific transitions on the mother-adolescent dataset shows interpretable trends related to the mother's depression symptoms.  ( 3 min )
    Mixed-type Distance Shrinkage and Selection for Clustering via Kernel Metric Learning. (arXiv:2306.01890v2 [cs.LG] UPDATED)
    Distance-based clustering and classification are widely used in various fields to group mixed numeric and categorical data. In many algorithms, a predefined distance measurement is used to cluster data points based on their dissimilarity. While there exist numerous distance-based measures for data with pure numerical attributes and several ordered and unordered categorical metrics, an efficient and accurate distance for mixed-type data that utilizes the continuous and discrete properties simulatenously is an open problem. Many metrics convert numerical attributes to categorical ones or vice versa. They handle the data points as a single attribute type or calculate a distance between each attribute separately and add them up. We propose a metric called KDSUM that uses mixed kernels to measure dissimilarity, with cross-validated optimal bandwidth selection. We demonstrate that KDSUM is a shrinkage method from existing mixed-type metrics to a uniform dissimilarity metric, and improves clustering accuracy when utilized in existing distance-based clustering algorithms on simulated and real-world datasets containing continuous-only, categorical-only, and mixed-type data.  ( 2 min )
    Multi-Response Heteroscedastic Gaussian Process Models and Their Inference. (arXiv:2308.15370v2 [stat.ML] UPDATED)
    Despite the widespread utilization of Gaussian process models for versatile nonparametric modeling, they exhibit limitations in effectively capturing abrupt changes in function smoothness and accommodating relationships with heteroscedastic errors. Addressing these shortcomings, the heteroscedastic Gaussian process (HeGP) regression seeks to introduce flexibility by acknowledging the variability of residual variances across covariates in the regression model. In this work, we extend the HeGP concept, expanding its scope beyond regression tasks to encompass classification and state-space models. To achieve this, we propose a novel framework where the Gaussian process is coupled with a covariate-induced precision matrix process, adopting a mixture formulation. This approach enables the modeling of heteroscedastic covariance functions across covariates. To mitigate the computational challenges posed by sampling, we employ variational inference to approximate the posterior and facilitate posterior predictive modeling. Additionally, our training process leverages an EM algorithm featuring closed-form M-step updates to efficiently evaluate the heteroscedastic covariance function. A notable feature of our model is its consistent performance on multivariate responses, accommodating various types (continuous or categorical) seamlessly. Through a combination of simulations and real-world applications in climatology, we illustrate the model's prowess and advantages. By overcoming the limitations of traditional Gaussian process models, our proposed framework offers a robust and versatile tool for a wide array of applications.  ( 2 min )
    Speeding up Fourier Neural Operators via Mixed Precision. (arXiv:2307.15034v2 [cs.LG] UPDATED)
    The Fourier neural operator (FNO) is a powerful technique for learning surrogate maps for partial differential equation (PDE) solution operators. For many real-world applications, which often require high-resolution data points, training time and memory usage are significant bottlenecks. While there are mixed-precision training techniques for standard neural networks, those work for real-valued datatypes on finite dimensions and therefore cannot be directly applied to FNO, which crucially operates in the (complex-valued) Fourier domain and in function spaces. On the other hand, since the Fourier transform is already an approximation (due to discretization error), we do not need to perform the operation at full precision. In this work, we (i) profile memory and runtime for FNO with full and mixed-precision training, (ii) conduct a study on the numerical stability of mixed-precision training of FNO, and (iii) devise a training routine which substantially decreases training time and memory usage (up to 34%), with little or no reduction in accuracy, on the Navier-Stokes and Darcy flow equations. Combined with the recently proposed tensorized FNO (Kossaifi et al., 2023), the resulting model has far better performance while also being significantly faster than the original FNO.  ( 2 min )
    The Role of Diverse Replay for Generalisation in Reinforcement Learning. (arXiv:2306.05727v2 [cs.LG] UPDATED)
    In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the impact of these components in the context of generalisation in multi-task RL. We investigate the hypothesis that collecting and training on more diverse data from the training environments will improve zero-shot generalisation to new tasks. We motivate mathematically and show empirically that generalisation to tasks that are "reachable'' during training is improved by increasing the diversity of transitions in the replay buffer. Furthermore, we show empirically that this same strategy also shows improvement for generalisation to similar but "unreachable'' tasks which could be due to improved generalisation of the learned latent representations.  ( 2 min )
    pTSE: A Multi-model Ensemble Method for Probabilistic Time Series Forecasting. (arXiv:2305.11304v2 [cs.LG] UPDATED)
    Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that the model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE overall member models and competitive ensemble methods.  ( 2 min )
    Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media. (arXiv:2307.09312v2 [cs.CL] UPDATED)
    We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks, such as Reddit discussions. In contrast to traditional comment-only methods, our approach to labelling a comment as hate speech involves a holistic analysis of text and images grounded in the discussion context. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion surrounding a comment and grounding the interwoven fusion layers that combine individual comments' text and image embeddings instead of processing modalities separately. We compare the performance of our model to baselines that only process individual comments and conduct extensive ablation studies. To evaluate our work, we present a new dataset, HatefulDiscussions, comprising complete multi-modal discussions from multiple online communities on Reddit. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation significantly advances the effort to detect anti-social behaviour.  ( 2 min )
    Generative Sliced MMD Flows with Riesz Kernels. (arXiv:2305.11463v2 [cs.LG] UPDATED)
    Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \Vert x-y\Vert^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.  ( 2 min )
    Data-driven Predictive Latency for 5G: A Theoretical and Experimental Analysis Using Network Measurements. (arXiv:2307.02329v3 [cs.NI] UPDATED)
    The advent of novel 5G services and applications with binding latency requirements and guaranteed Quality of Service (QoS) hastened the need to incorporate autonomous and proactive decision-making in network management procedures. The objective of our study is to provide a thorough analysis of predictive latency within 5G networks by utilizing real-world network data that is accessible to mobile network operators (MNOs). In particular, (i) we present an analytical formulation of the user-plane latency as a Hypoexponential distribution, which is validated by means of a comparative analysis with empirical measurements, and (ii) we conduct experimental results of probabilistic regression, anomaly detection, and predictive forecasting leveraging on emerging domains in Machine Learning (ML), such as Bayesian Learning (BL) and Machine Learning on Graphs (GML). We test our predictive framework using data gathered from scenarios of vehicular mobility, dense-urban traffic, and social gathering events. Our results provide valuable insights into the efficacy of predictive algorithms in practical applications.  ( 2 min )
    DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks. (arXiv:2307.05628v3 [q-bio.GN] UPDATED)
    Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains a challenge. To address this, we propose DNAGPT, a generalized DNA pre-training model trained on over 200 billion base pairs from all mammals. By enhancing the classic GPT model with a binary classification task (DNA sequence order), a numerical regression task (guanine-cytosine content prediction), and a comprehensive token language, DNAGPT can handle versatile DNA analysis tasks while processing both sequence and numerical data. Our evaluation of genomic signal and region recognition, mRNA abundance regression, and artificial genomes generation tasks demonstrates DNAGPT's superior performance compared to existing models designed for specific downstream tasks, benefiting from pre-training using the newly designed model structure.  ( 2 min )
    Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks. (arXiv:2206.07741v2 [cs.LG] UPDATED)
    The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for neural network inference, facilitating the use of DNNs on edge computing platforms. Recent efforts at quantizing DNNs have employed a range of techniques encompassing progressive quantization, step-size adaptation, and gradient scaling. This paper proposes a new quantization approach for mixed precision convolutional neural networks (CNNs) targeting edge-computing. Our method establishes a new pareto frontier in model accuracy and memory footprint demonstrating a range of quantized models, delivering best-in-class accuracy below 4.3 MB of weights (wgts.) and activations (acts.). Our main contributions are: (i) hardware-aware heterogeneous differentiable quantization with tensor-sliced learned precision, (ii) targeted gradient modification for wgts. and acts. to mitigate quantization errors, and (iii) a multi-phase learning schedule to address instability in learning arising from updates to the learned quantizer and model parameters. We demonstrate the effectiveness of our techniques on the ImageNet dataset across a range of models including EfficientNet-Lite0 (e.g., 4.14MB of wgts. and acts. at 67.66% accuracy) and MobileNetV2 (e.g., 3.51MB wgts. and acts. at 65.39% accuracy).  ( 2 min )
    Seeking Interpretability and Explainability in Binary Activated Neural Networks. (arXiv:2209.03450v2 [cs.LG] UPDATED)
    We study the use of binary activated neural networks as interpretable and explainable predictors in the context of regression tasks on tabular data; more specifically, we provide guarantees on their expressiveness, present an approach based on the efficient computation of SHAP values for quantifying the relative importance of the features, hidden neurons and even weights. As the model's simplicity is instrumental in achieving interpretability, we propose a greedy algorithm for building compact binary activated networks. This approach doesn't need to fix an architecture for the network in advance: it is built one layer at a time, one neuron at a time, leading to predictors that aren't needlessly complex for a given task.  ( 2 min )
    Inferring Traffic Models in Terminal Airspace from Flight Tracks and Procedures. (arXiv:2303.09981v2 [cs.LG] UPDATED)
    Realistic aircraft trajectory models are useful in the design and validation of air traffic management (ATM) systems. Models of aircraft operated under instrument flight rules (IFR) require capturing the variability inherent in how aircraft follow standard flight procedures. The variability in aircraft behavior varies among flight stages. In this paper, we propose a probabilistic model that can learn the variability from the procedural data and flight tracks collected from radar surveillance data. For each segment, a Gaussian mixture model is used to learn the deviations of aircraft trajectories from their procedures. Given new procedures, we can generate synthetic trajectories by sampling a series of deviations from the trained Gaussian distributions and reconstructing the aircraft trajectory using the deviations and the procedures. We extend this method to capture pairwise correlations between aircraft and show how a pairwise model can be used to generate traffic involving an arbitrary number of aircraft. We demonstrate the proposed models on the arrival tracks and procedures of the John F. Kennedy International Airport. The distributional similarity between the original and the synthetic trajectory dataset was evaluated using the Jensen-Shannon divergence between the empirical distributions of different variables. We also provide qualitative analyses of the synthetic trajectories generated from the models.  ( 2 min )
    Collage Diffusion. (arXiv:2303.00262v2 [cs.CV] UPDATED)
    We seek to give users precise control over diffusion-based image generation by modeling complex scenes as sequences of layers, which define the desired spatial arrangement and visual attributes of objects in the scene. Collage Diffusion harmonizes the input layers to make objects fit together -- the key challenge involves minimizing changes in the positions and key visual attributes of the input layers while allowing other attributes to change in the harmonization process. We ensure that objects are generated in the correct locations by modifying text-image cross-attention with the layers' alpha masks. We preserve key visual attributes of input layers by learning specialized text representations per layer and by extending ControlNet to operate on layers. Layer input allows users to control the extent of image harmonization on a per-object basis, and users can even iteratively edit individual objects in generated images while keeping other objects fixed. By leveraging the rich information present in layer input, Collage Diffusion generates globally harmonized images that maintain desired object characteristics better than prior approaches.  ( 2 min )
    Fair Attribute Completion on Graph with Missing Attributes. (arXiv:2302.12977v3 [cs.LG] UPDATED)
    Tackling unfairness in graph learning models is a challenging task, as the unfairness issues on graphs involve both attributes and topological structures. Existing work on fair graph learning simply assumes that attributes of all nodes are available for model training and then makes fair predictions. In practice, however, the attributes of some nodes might not be accessible due to missing data or privacy concerns, which makes fair graph learning even more challenging. In this paper, we propose FairAC, a fair attribute completion method, to complement missing information and learn fair node embeddings for graphs with missing attributes. FairAC adopts an attention mechanism to deal with the attribute missing problem and meanwhile, it mitigates two types of unfairness, i.e., feature unfairness from attributes and topological unfairness due to attribute completion. FairAC can work on various types of homogeneous graphs and generate fair embeddings for them and thus can be applied to most downstream tasks to improve their fairness performance. To our best knowledge, FairAC is the first method that jointly addresses the graph attribution completion and graph unfairness problems. Experimental results on benchmark datasets show that our method achieves better fairness performance with less sacrifice in accuracy, compared with the state-of-the-art methods of fair graph learning. Code is available at: https://github.com/donglgcn/FairAC.  ( 2 min )
    From Chaos Comes Order: Ordering Event Representations for Object Recognition and Detection. (arXiv:2304.13455v4 [cs.CV] UPDATED)
    Today, state-of-the-art deep neural networks that process events first convert them into dense, grid-like input representations before using an off-the-shelf network. However, selecting the appropriate representation for the task traditionally requires training a neural network for each representation and selecting the best one based on the validation score, which is very time-consuming. This work eliminates this bottleneck by selecting representations based on the Gromov-Wasserstein Discrepancy (GWD) between raw events and their representation. It is about 200 times faster to compute than training a neural network and preserves the task performance ranking of event representations across multiple representations, network backbones, datasets, and tasks. Thus finding representations with high task scores is equivalent to finding representations with a low GWD. We use this insight to, for the first time, perform a hyperparameter search on a large family of event representations, revealing new and powerful representations that exceed the state-of-the-art. Our optimized representations outperform existing representations by 1.7 mAP on the 1 Mpx dataset and 0.3 mAP on the Gen1 dataset, two established object detection benchmarks, and reach a 3.8% higher classification score on the mini N-ImageNet benchmark. Moreover, we outperform state-of-the-art by 2.1 mAP on Gen1 and state-of-the-art feed-forward methods by 6.0 mAP on the 1 Mpx datasets. This work opens a new unexplored field of explicit representation optimization for event-based learning.  ( 3 min )
    Learning to Taste: A Multimodal Wine Dataset. (arXiv:2308.16900v1 [cs.LG])
    We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique vintages, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.  ( 2 min )
    GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields. (arXiv:2308.16891v1 [cs.RO])
    It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present $\textbf{GNFactor}$, a visual behavior cloning agent for multi-task robotic manipulation with $\textbf{G}$eneralizable $\textbf{N}$eural feature $\textbf{F}$ields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e.g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is https://yanjieze.com/GNFactor/ .  ( 2 min )
    Hypergraph Structure Inference From Data Under Smoothness Prior. (arXiv:2308.14172v2 [cs.LG] UPDATED)
    Hypergraphs are important for processing data with higher-order relationships involving more than two entities. In scenarios where explicit hypergraphs are not readily available, it is desirable to infer a meaningful hypergraph structure from the node features to capture the intrinsic relations within the data. However, existing methods either adopt simple pre-defined rules that fail to precisely capture the distribution of the potential hypergraph structure, or learn a mapping between hypergraph structures and node features but require a large amount of labelled data, i.e., pre-existing hypergraph structures, for training. Both restrict their applications in practical scenarios. To fill this gap, we propose a novel smoothness prior that enables us to design a method to infer the probability for each potential hyperedge without labelled data as supervision. The proposed prior indicates features of nodes in a hyperedge are highly correlated by the features of the hyperedge containing them. We use this prior to derive the relation between the hypergraph structure and the node features via probabilistic modelling. This allows us to develop an unsupervised inference method to estimate the probability for each potential hyperedge via solving an optimisation problem that has an analytical solution. Experiments on both synthetic and real-world data demonstrate that our method can learn meaningful hypergraph structures from data more efficiently than existing hypergraph structure inference methods.
    StyleGAN as a Utility-Preserving Face De-identification Method. (arXiv:2212.02611v2 [cs.CV] UPDATED)
    Face de-identification methods have been proposed to preserve users' privacy by obscuring their faces. These methods, however, can degrade the quality of photos, and they usually do not preserve the utility of faces, i.e., their age, gender, pose, and facial expression. Recently, GANs, such as StyleGAN, have been proposed, which generate realistic, high-quality imaginary faces. In this paper, we investigate the use of StyleGAN in generating de-identified faces through style mixing. We examined this de-identification method for preserving utility and privacy by implementing several face detection, verification, and identification attacks and conducting a user study. The results from our extensive experiments, human evaluation, and comparison with two state-of-the-art methods, i.e., CIAGAN and DeepPrivacy, show that StyleGAN performs on par or better than these methods, preserving users' privacy and images' utility. In particular, the results of the machine learning-based experiments show that StyleGAN0-4 preserves utility better than CIAGAN and DeepPrivacy while preserving privacy at the same level. StyleGAN0-3 preserves utility at the same level while providing more privacy. In this paper, for the first time, we also performed a carefully designed user study to examine both privacy and utility-preserving properties of StyleGAN0-3, 0-4, and 0-5, as well as CIAGAN and DeepPrivacy from the human observers' perspectives. Our statistical tests showed that participants tend to verify and identify StyleGAN0-5 images more easily than DeepPrivacy images. All the methods but StyleGAN0-5 had significantly lower identification rates than CIAGAN. Regarding utility, as expected, StyleGAN0-5 performed significantly better in preserving some attributes. Among all methods, on average, participants believe gender has been preserved the most while naturalness has been preserved the least.
    Stochastic Configuration Machines for Industrial Artificial Intelligence. (arXiv:2308.13570v2 [cs.LG] UPDATED)
    Real-time predictive modelling with desired accuracy is highly expected in industrial artificial intelligence (IAI), where neural networks play a key role. Neural networks in IAI require powerful, high-performance computing devices to operate a large number of floating point data. Based on stochastic configuration networks (SCNs), this paper proposes a new randomized learner model, termed stochastic configuration machines (SCMs), to stress effective modelling and data size saving that are useful and valuable for industrial applications. Compared to SCNs and random vector functional-link (RVFL) nets with binarized implementation, the model storage of SCMs can be significantly compressed while retaining favourable prediction performance. Besides the architecture of the SCM learner model and its learning algorithm, as an important part of this contribution, we also provide a theoretical basis on the learning capacity of SCMs by analysing the model's complexity. Experimental studies are carried out over some benchmark datasets and three industrial applications. The results demonstrate that SCM has great potential for dealing with industrial data analytics.
    RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition. (arXiv:2308.11029v2 [cs.AI] UPDATED)
    Emotion recognition in conversation (ERC) has received increasing attention from researchers due to its wide range of applications.As conversation has a natural graph structure,numerous approaches used to model ERC based on graph convolutional networks (GCNs) have yielded significant results.However,the aggregation approach of traditional GCNs suffers from the node information redundancy problem,leading to node discriminant information loss.Additionally,single-layer GCNs lack the capacity to capture long-range contextual information from the graph. Furthermore,the majority of approaches are based on textual modality or stitching together different modalities, resulting in a weak ability to capture interactions between modalities. To address these problems, we present the relational bilevel aggregation graph convolutional network (RBA-GCN), which consists of three modules: the graph generation module (GGM), similarity-based cluster building module (SCBM) and bilevel aggregation module (BiAM). First, GGM constructs a novel graph to reduce the redundancy of target node information.Then,SCBM calculates the node similarity in the target node and its structural neighborhood, where noisy information with low similarity is filtered out to preserve the discriminant information of the node. Meanwhile, BiAM is a novel aggregation method that can preserve the information of nodes during the aggregation process. This module can construct the interaction between different modalities and capture long-range contextual information based on similarity clusters. On both the IEMOCAP and MELD datasets, the weighted average F1 score of RBA-GCN has a 2.17$\sim$5.21\% improvement over that of the most advanced method.Our code is available at https://github.com/luftmenscher/RBA-GCN and our article with the same name has been published in IEEE/ACM Transactions on Audio,Speech,and Language Processing,vol.31,2023
    xxMD: Benchmarking Neural Force Fields Using Extended Dynamics beyond Equilibrium. (arXiv:2308.11155v2 [cs.LG] UPDATED)
    Neural force fields (NFFs) have gained prominence in computational chemistry as surrogate models, superseding quantum-chemistry calculations in ab initio molecular dynamics. The prevalent benchmark for NFFs has been the MD17 dataset and its subsequent extension. These datasets predominantly comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampling from direct adiabatic dynamics. However, many chemical reactions entail significant molecular deformations, notably bond breaking. We demonstrate the constrained distribution of internal coordinates and energies in the MD17 datasets, underscoring their inadequacy for representing systems undergoing chemical reactions. Addressing this sampling limitation, we introduce the xxMD (Extended Excited-state Molecular Dynamics) dataset, derived from non-adiabatic dynamics. This dataset encompasses energies and forces ascertained from both multireference wave function theory and density functional theory. Furthermore, its nuclear configuration spaces authentically depict chemical reactions, making xxMD a more chemically relevant dataset. Our re-assessment of equivariant models on the xxMD datasets reveals notably higher mean absolute errors than those reported for MD17 and its variants. This observation underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability. Our proposed xxMD-CASSCF and xxMD-DFT datasets are available at https://github.com/zpengmei/xxMD.
    Expressive Text-to-Image Generation with Rich Text. (arXiv:2304.06720v2 [cs.CV] UPDATED)
    Plain text has become a prevalent interface for text-to-image synthesis. However, its limited customization options hinder users from accurately describing desired outputs. For example, plain text makes it hard to specify continuous quantities, such as the precise RGB color value or importance of each word. Furthermore, creating detailed text prompts for complex scenes is tedious for humans to write and challenging for text encoders to interpret. To address these challenges, we propose using a rich-text editor supporting formats such as font style, size, color, and footnote. We extract each word's attributes from rich text to enable local style control, explicit token reweighting, precise color rendering, and detailed region synthesis. We achieve these capabilities through a region-based diffusion process. We first obtain each word's region based on attention maps of a diffusion process using plain text. For each region, we enforce its text attributes by creating region-specific detailed prompts and applying region-specific guidance, and maintain its fidelity against plain-text generation through region-based injections. We present various examples of image generation from rich text and demonstrate that our method outperforms strong baselines with quantitative evaluations.
    A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks. (arXiv:2304.14994v2 [cs.LG] UPDATED)
    Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks.
    Symmetry-Preserving Program Representations for Learning Code Semantics. (arXiv:2308.03312v5 [cs.LG] UPDATED)
    Large Language Models (LLMs) have shown promise in automated program reasoning, a crucial aspect of many security tasks. However, existing LLM architectures for code are often borrowed from other domains like natural language processing, raising concerns about their generalization and robustness to unseen code. A key generalization challenge is to incorporate the knowledge of code semantics, including control and data flow, into the LLM architectures. Drawing inspiration from examples of convolution layers exploiting translation symmetry, we explore how code symmetries can enhance LLM architectures for program analysis and modeling. We present a rigorous group-theoretic framework that formally defines code symmetries as semantics-preserving transformations and provides techniques for precisely reasoning about symmetry preservation within LLM architectures. Using this framework, we introduce a novel variant of self-attention that preserves program symmetries, demonstrating its effectiveness in generalization and robustness through detailed experimental evaluations across different binary and source code analysis tasks. Overall, our code symmetry framework offers rigorous and powerful reasoning techniques that can guide the future development of specialized LLMs for code and advance LLM-guided program reasoning tasks.
    Adaptive Uncertainty-Guided Model Selection for Data-Driven PDE Discovery. (arXiv:2308.10283v2 [cs.LG] UPDATED)
    We propose a new parameter-adaptive uncertainty-penalized Bayesian information criterion (UBIC) to prioritize the parsimonious partial differential equation (PDE) that sufficiently governs noisy spatial-temporal observed data with few reliable terms. Since the naive use of the BIC for model selection has been known to yield an undesirable overfitted PDE, the UBIC penalizes the found PDE not only by its complexity but also the quantified uncertainty, derived from the model supports' coefficient of variation in a probabilistic view. We also introduce physics-informed neural network learning as a simulation-based approach to further validate the selected PDE flexibly against the other discovered PDE. Numerical results affirm the successful application of the UBIC in identifying the true governing PDE. Additionally, we reveal an interesting effect of denoising the observed data on improving the trade-off between the BIC score and model complexity. Code is available at https://github.com/Pongpisit-Thanasutives/UBIC.
    Pre-Training Representations of Binary Code Using Contrastive Learning. (arXiv:2210.05102v2 [cs.SE] UPDATED)
    Compiled software is delivered as executable binary code. Developers write source code to express the software semantics, but the compiler converts it to a binary format that the CPU can directly execute. Therefore, binary code analysis is critical to applications in reverse engineering and computer security tasks where source code is not available. However, unlike source code and natural language that contain rich semantic information, binary code is typically difficult for human engineers to understand and analyze. While existing work uses AI models to assist source code analysis, few studies have considered binary code. In this paper, we propose a COntrastive learning Model for Binary cOde Analysis, or COMBO, that incorporates source code and comment information into binary code during representation learning. Specifically, we present three components in COMBO: (1) a primary contrastive learning method for cold-start pre-training, (2) a simplex interpolation method to incorporate source code, comments, and binary code, and (3) an intermediate representation learning algorithm to provide binary code embeddings. Finally, we evaluate the effectiveness of the pre-trained representations produced by COMBO using three indicative downstream tasks relating to binary code: algorithmic functionality classification, binary code similarity, and vulnerability detection. Our experimental results show that COMBO facilitates representation learning of binary code visualized by distribution analysis, and improves the performance on all three downstream tasks by 5.45% on average compared to state-of-the-art large-scale language representation models. To the best of our knowledge, COMBO is the first language representation model that incorporates source code, binary code, and comments into contrastive code representation learning and unifies multiple tasks for binary code analysis.
    Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning. (arXiv:2303.08566v2 [cs.CV] UPDATED)
    Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty. However, existing PEFT methods introduce trainable parameters to the same positions across different tasks depending solely on human heuristics and neglect the domain gaps. To this end, we study where to introduce and how to allocate trainable parameters by proposing a novel Sensitivity-aware visual Parameter-efficient fine-Tuning (SPT) scheme, which adaptively allocates trainable parameters to task-specific important positions given a desired tunable parameter budget. Specifically, our SPT first quickly identifies the sensitive parameters that require tuning for a given task in a data-dependent way. Next, our SPT further boosts the representational capability for the weight matrices whose number of sensitive parameters exceeds a pre-defined threshold by utilizing existing structured tuning methods, e.g., LoRA [23] or Adapter [22], to replace directly tuning the selected sensitive parameters (unstructured tuning) under the budget. Extensive experiments on a wide range of downstream recognition tasks show that our SPT is complementary to the existing PEFT methods and largely boosts their performance, e.g., SPT improves Adapter with supervised pre-trained ViT-B/16 backbone by 4.2% and 1.4% mean Top-1 accuracy, reaching SOTA performance on FGVC and VTAB-1k benchmarks, respectively. Source code is at https://github.com/ziplab/SPT
    Quantization-based Optimization with Perspective of Quantum Mechanics. (arXiv:2308.11594v2 [quant-ph] UPDATED)
    Statistical and stochastic analysis based on thermodynamics has been the main analysis framework for stochastic global optimization. Recently, appearing quantum annealing or quantum tunneling algorithm for global optimization, we require a new researching framework for global optimization algorithms. In this paper, we provide the analysis for quantization-based optimization based on the Schr\"odinger equation to reveal what property in quantum mechanics enables global optimization. We present that the tunneling effect derived by the Schr\"odinger equation in quantization-based optimization enables to escape of a local minimum. Additionally, we confirm that this tunneling effect is the same property included in quantum mechanics-based global optimization. Experiments with standard multi-modal benchmark functions represent that the proposed analysis is valid.
    Flexible Phase Dynamics for Bio-Plausible Contrastive Learning. (arXiv:2302.12431v2 [cs.LG] UPDATED)
    Many learning algorithms used as normative models in neuroscience or as candidate approaches for learning on neuromorphic chips learn by contrasting one set of network states with another. These Contrastive Learning (CL) algorithms are traditionally implemented with rigid, temporally non-local, and periodic learning dynamics that could limit the range of physical systems capable of harnessing CL. In this study, we build on recent work exploring how CL might be implemented by biological or neurmorphic systems and show that this form of learning can be made temporally local, and can still function even if many of the dynamical requirements of standard training procedures are relaxed. Thanks to a set of general theorems corroborated by numerical experiments across several CL models, our results provide theoretical foundations for the study and development of CL methods for biological and neuromorphic neural networks.
    Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities. (arXiv:2302.08761v3 [cs.LG] UPDATED)
    Traffic analysis is crucial for urban operations and planning, while the availability of dense urban traffic data beyond loop detectors is still scarce. We present a large-scale floating vehicle dataset of per-street segment traffic information, Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities (MeTS-10), available for 10 global cities with a 15-minute resolution for collection periods ranging between 108 and 361 days in 2019-2021 and covering more than 1500 square kilometers per metropolitan area. MeTS-10 features traffic speed information at all street levels from main arterials to local streets for Antwerp, Bangkok, Barcelona, Berlin, Chicago, Istanbul, London, Madrid, Melbourne and Moscow. The dataset leverages the industrial-scale floating vehicle Traffic4cast data with speeds and vehicle counts provided in a privacy-preserving spatio-temporal aggregation. We detail the efficient matching approach mapping the data to the OpenStreetMap road graph. We evaluate the dataset by comparing it with publicly available stationary vehicle detector data (for Berlin, London, and Madrid) and the Uber traffic speed dataset (for Barcelona, Berlin, and London). The comparison highlights the differences across datasets in spatio-temporal coverage and variations in the reported traffic caused by the binning method. MeTS-10 enables novel, city-wide analysis of mobility and traffic patterns for ten major world cities, overcoming current limitations of spatially sparse vehicle detector data. The large spatial and temporal coverage offers an opportunity for joining the MeTS-10 with other datasets, such as traffic surveys in traffic planning studies or vehicle detector data in traffic control settings.
    When Deep Learning Meets Polyhedral Theory: A Survey. (arXiv:2305.00241v2 [math.OC] UPDATED)
    In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure $\unicode{x2014}$such as the typical fully-connected feedforward neural network$\unicode{x2014}$ amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks.
    Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization. (arXiv:2210.09134v2 [cs.LG] UPDATED)
    Bayesian model reduction provides an efficient approach for comparing the performance of all nested sub-models of a model, without re-evaluating any of these sub-models. Until now, Bayesian model reduction has been applied mainly in the computational neuroscience community on simple models. In this paper, we formulate and apply Bayesian model reduction to perform principled pruning of Bayesian neural networks, based on variational free energy minimization. Direct application of Bayesian model reduction, however, gives rise to approximation errors. Therefore, a novel iterative pruning algorithm is presented to alleviate the problems arising with naive Bayesian model reduction, as supported experimentally on the publicly available UCI datasets for different inference algorithms. This novel parameter pruning scheme solves the shortcomings of current state-of-the-art pruning methods that are used by the signal processing community. The proposed approach has a clear stopping criterion and minimizes the same objective that is used during training. Next to these benefits, our experiments indicate better model performance in comparison to state-of-the-art pruning schemes.
    StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space. (arXiv:2303.05102v2 [stat.ML] UPDATED)
    One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.
    DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion. (arXiv:2303.12743v4 [cs.CV] UPDATED)
    In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.git
    Learning Melanocytic Cell Masks from Adjacent Stained Tissue. (arXiv:2211.00646v3 [q-bio.QM] UPDATED)
    Melanoma is one of the most aggressive forms of skin cancer, causing a large proportion of skin cancer deaths. However, melanoma diagnoses by pathologists shows low interrater reliability. As melanoma is a cancer of the melanocyte, there is a clear need to develop a melanocytic cell segmentation tool that is agnostic to pathologist variability and automates pixel-level annotation. Gigapixel-level pathologist labeling, however, is impractical. Herein, we propose a means to train deep neural networks for melanocytic cell segmentation from hematoxylin and eosin (H&E) stained sections and paired immunohistochemistry (IHC) of adjacent tissue sections, achieving a mean IOU of 0.64 despite imperfect ground-truth labels.
    On-Demand Communication for Asynchronous Multi-Agent Bandits. (arXiv:2302.07446v2 [cs.LG] UPDATED)
    This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.
    0/1 Deep Neural Networks via Block Coordinate Descent. (arXiv:2206.09379v2 [cs.LG] UPDATED)
    The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs). As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for several decades. Even if there is an impressive body of work on designing DNNs with continuous activation functions that can be deemed as surrogates of the step function, it is still in the possession of some advantageous properties, such as complete robustness to outliers and being capable of attaining the best learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we aim to train DNNs with the step function used as an activation function (dubbed as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization problem and then solve it by a block coordinate descend (BCD) method. Moreover, we acquire closed-form solutions for sub-problems of BCD as well as its convergence properties. Furthermore, we also integrate $\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process and compress the network scale. As a result, the proposed algorithm has a high performance on classifying MNIST and Fashion-MNIST datasets. As a result, the proposed algorithm has a desirable performance on classifying MNIST, FashionMNIST, Cifar10, and Cifar100 datasets.
    Extending regionalization algorithms to explore spatial process heterogeneity. (arXiv:2206.09429v4 [stat.ME] UPDATED)
    In spatial regression models, spatial heterogeneity may be considered with either continuous or discrete specifications. The latter is related to delineation of spatially connected regions with homogeneous relationships between variables (spatial regimes). Although various regionalization algorithms have been proposed and studied in the field of spatial analytics, methods to optimize spatial regimes have been largely unexplored. In this paper, we propose two new algorithms for spatial regime delineation, two-stage K-Models and Regional-K-Models. We also extend the classic Automatic Zoning Procedure to spatial regression context. The proposed algorithms are applied to a series of synthetic datasets and two real-world datasets. Results indicate that all three algorithms achieve superior or comparable performance to existing approaches, while the two-stage K-Models algorithm largely outperforms existing approaches on model fitting, region reconstruction, and coefficient estimation. Our work enriches the spatial analytics toolbox to explore spatial heterogeneous processes.
    Hypernetwork approach to Bayesian MAML. (arXiv:2210.02796v2 [cs.LG] UPDATED)
    The main goal of Few-Shot learning algorithms is to enable learning from small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the shared universal weights of a meta-model, which are then adapted for specific tasks. However, the method suffers from over-fitting and poorly quantifies uncertainty due to limited data size. Bayesian approaches could, in principle, alleviate these shortcomings by learning weight distributions in place of point-wise weights. Unfortunately, previous modifications of MAML are limited due to the simplicity of Gaussian posteriors, MAML-like gradient-based weight updates, or by the same structure enforced for universal and adapted weights. In this paper, we propose a novel framework for Bayesian MAML called BayesianHMAML, which employs Hypernetworks for weight updates. It learns the universal weights point-wise, but a probabilistic structure is added when adapted for specific tasks. In such a framework, we can use simple Gaussian distributions or more complicated posteriors induced by Continuous Normalizing Flows.
    Federated Adaptive Prompt Tuning for Multi-domain Collaborative Learning. (arXiv:2211.07864v2 [cs.LG] UPDATED)
    Federated learning (FL) enables multiple clients to collaboratively train a global model without disclosing their data. Previous researches often require training the complete model parameters. However, the emergence of powerful pre-trained models makes it possible to achieve higher performance with fewer learnable parameters in FL. In this paper, we propose a federated adaptive prompt tuning algorithm, FedAPT, for multi-domain collaborative image classification with powerful foundation models, like CLIP. Compared with direct federated prompt tuning, our core idea is to adaptively unlock specific domain knowledge for each test sample in order to provide them with personalized prompts. To implement this idea, we design an adaptive prompt tuning module, which consists of a meta prompt, an adaptive network, and some keys. The server randomly generates a set of keys and assigns a unique key to each client. Then all clients cooperatively train the global adaptive network and meta prompt with the local datasets and the frozen keys. Ultimately, the global aggregation model can assign a personalized prompt to CLIP based on the domain features of each test sample. We perform extensive experiments on two multi-domain image classification datasets across two different settings - supervised and unsupervised. The results show that FedAPT can achieve better performance with less than 10\% of the number of parameters of the fully trained model, and the global model can perform well in diverse client domains simultaneously.
    Point Cloud-based Proactive Link Quality Prediction for Millimeter-wave Communications. (arXiv:2301.00752v3 [cs.NI] UPDATED)
    This study demonstrates the feasibility of point cloud-based proactive link quality prediction for millimeter-wave (mmWave) communications. Previous studies have proposed machine learning-based methods to predict received signal strength for future time periods using time series of depth images to mitigate the line-of-sight (LOS) path blockage by pedestrians in mmWave communication. However, these image-based methods have limited applicability due to privacy concerns as camera images may contain sensitive information. This study proposes a point cloud-based method for mmWave link quality prediction and demonstrates its feasibility through experiments. Point clouds represent three-dimensional (3D) spaces as a set of points and are sparser and less likely to contain sensitive information than camera images. Additionally, point clouds provide 3D position and motion information, which is necessary for understanding the radio propagation environment involving pedestrians. This study designs the mmWave link quality prediction method and conducts realistic indoor experiments, where the link quality fluctuates significantly due to human blockage, using commercially available IEEE 802.11ad-based 60 GHz wireless LAN devices and Kinect v2 RGB-D camera and Velodyne VLP-16 light detection and ranging (LiDAR) for point cloud acquisition. The experimental results showed that our proposed method can predict future large attenuation of mmWave received signal strength and throughput induced by the LOS path blockage by pedestrians with comparable or superior accuracy to image-based prediction methods. Hence, our point cloud-based method can serve as a viable alternative to image-based methods.
    Sequential Informed Federated Unlearning: Efficient and Provable Client Unlearning in Federated Optimization. (arXiv:2211.11656v4 [cs.LG] UPDATED)
    The aim of Machine Unlearning (MU) is to provide theoretical guarantees on the removal of the contribution of a given data point from a training procedure. Federated Unlearning (FU) consists in extending MU to unlearn a given client's contribution from a federated training routine. Current FU approaches are generally not scalable, and do not come with sound theoretical quantification of the effectiveness of unlearning. In this work we present Informed Federated Unlearning (IFU), a novel efficient and quantifiable FU approach. Upon unlearning request from a given client, IFU identifies the optimal FL iteration from which FL has to be reinitialized, with unlearning guarantees obtained through a randomized perturbation mechanism. The theory of IFU is also extended to account for sequential unlearning requests. Experimental results on different tasks and dataset show that IFU leads to more efficient unlearning procedures as compared to basic re-training and state-of-the-art FU approaches.
    Simulation-Based Optimization of User Interfaces for Quality-Assuring Machine Learning Model Predictions. (arXiv:2104.01129v2 [cs.HC] UPDATED)
    Quality-sensitive applications of machine learning (ML) require quality assurance (QA) by humans before the predictions of an ML model can be deployed. QA for ML (QA4ML) interfaces require users to view a large amount of data and perform many interactions to correct errors made by the ML model. An optimized user interface (UI) can significantly reduce interaction costs. While UI optimization can be informed by user studies evaluating design options, this approach is not scalable because there are typically numerous small variations that can affect the efficiency of a QA4ML interface. Hence, we propose using simulation to evaluate and aid the optimization of QA4ML interfaces. In particular, we focus on simulating the combined effects of human intelligence in initiating appropriate interaction commands and machine intelligence in providing algorithmic assistance for accelerating QA4ML processes. As QA4ML is usually labor-intensive, we use the simulated task completion time as the metric for UI optimization under different interface and algorithm setups. We demonstrate the usage of this UI design method in several QA4ML applications.
    Combining Inductive and Deductive Reasoning for Query Answering over Incomplete Knowledge Graphs. (arXiv:2106.14052v2 [cs.AI] UPDATED)
    Current methods for embedding-based query answering over incomplete Knowledge Graphs (KGs) only focus on inductive reasoning, i.e., predicting answers by learning patterns from the data, and lack the complementary ability to do deductive reasoning, which requires the application of domain knowledge to infer further information. To address this shortcoming, we investigate the problem of incorporating ontologies into embedding-based query answering models by defining the task of embedding-based ontology-mediated query answering. We propose various integration strategies into prominent representatives of embedding models that involve (1) different ontology-driven data augmentation techniques and (2) adaptation of the loss function to enforce the ontology axioms. We design novel benchmarks for the considered task based on the LUBM and the NELL KGs and evaluate our methods on them. The achieved improvements in the setting that requires both inductive and deductive reasoning are from 20% to 55% in HITS@3.
    Natural Quantum Monte Carlo Computation of Excited States. (arXiv:2308.16848v1 [physics.comp-ph])
    We present a variational Monte Carlo algorithm for estimating the lowest excited states of a quantum system which is a natural generalization of the estimation of ground states. The method has no free parameters and requires no explicit orthogonalization of the different states, instead transforming the problem of finding excited states of a given system into that of finding the ground state of an expanded system. Expected values of arbitrary observables can be calculated, including off-diagonal expectations between different states such as the transition dipole moment. Although the method is entirely general, it works particularly well in conjunction with recent work on using neural networks as variational Ansatze for many-electron systems, and we show that by combining this method with the FermiNet and Psiformer Ansatze we can accurately recover vertical excitation energies and oscillator strengths on molecules as large as benzene. Beyond the examples on molecules presented here, we expect this technique will be of great interest for applications of variational quantum Monte Carlo to atomic, nuclear and condensed matter physics.
    Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. (arXiv:2208.00780v5 [cs.CV] UPDATED)
    Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.
    PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction. (arXiv:2308.16896v1 [cs.CV])
    Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark. Code: https://github.com/wzzheng/PointOcc.
    FedDD: Toward Communication-efficient Federated Learning with Differential Parameter Dropout. (arXiv:2308.16835v1 [cs.LG])
    Federated Learning (FL) requires frequent exchange of model parameters, which leads to long communication delay, especially when the network environments of clients vary greatly. Moreover, the parameter server needs to wait for the slowest client (i.e., straggler, which may have the largest model size, lowest computing capability or worst network condition) to upload parameters, which may significantly degrade the communication efficiency. Commonly-used client selection methods such as partial client selection would lead to the waste of computing resources and weaken the generalization of the global model. To tackle this problem, along a different line, in this paper, we advocate the approach of model parameter dropout instead of client selection, and accordingly propose a novel framework of Federated learning scheme with Differential parameter Dropout (FedDD). FedDD consists of two key modules: dropout rate allocation and uploaded parameter selection, which will optimize the model parameter uploading ratios tailored to different clients' heterogeneous conditions and also select the proper set of important model parameters for uploading subject to clients' dropout rate constraints. Specifically, the dropout rate allocation is formulated as a convex optimization problem, taking system heterogeneity, data heterogeneity, and model heterogeneity among clients into consideration. The uploaded parameter selection strategy prioritizes on eliciting important parameters for uploading to speedup convergence. Furthermore, we theoretically analyze the convergence of the proposed FedDD scheme. Extensive performance evaluations demonstrate that the proposed FedDD scheme can achieve outstanding performances in both communication efficiency and model convergence, and also possesses a strong generalization capability to data of rare classes.
    Learning Optimal Strategies for Temporal Tasks in Stochastic Games. (arXiv:2102.04307v3 [cs.AI] UPDATED)
    Synthesis from linear temporal logic (LTL) specifications provides assured controllers for systems operating in stochastic and potentially adversarial environments. Automatic synthesis tools, however, require a model of the environment to construct controllers. In this work, we introduce a model-free reinforcement learning (RL) approach to derive controllers from given LTL specifications even when the environment is completely unknown. We model the problem as a stochastic game (SG) between the controller and the adversarial environment; we then learn optimal control strategies that maximize the probability of satisfying the LTL specifications against the worst-case environment behavior. We first construct a product game using the deterministic parity automaton (DPA) translated from the given LTL specification. By deriving distinct rewards and discount factors from the acceptance condition of the DPA, we reduce the maximization of the worst-case probability of satisfying the LTL specification into the maximization of a discounted reward objective in the product game; this enables the use of model-free RL algorithms to learn an optimal controller strategy. To deal with the common scalability problems when the number of sets defining the acceptance condition of the DPA (usually referred as colors), is large, we propose a lazy color generation method where distinct rewards and discount factors are utilized only when needed, and an approximate method where the controller eventually focuses on only one color. In several case studies, we show that our approach is scalable to a wide range of LTL formulas, significantly outperforming existing methods for learning controllers from LTL specifications in SGs.
    Leveraging Image-based Generative Adversarial Networks for Time Series Generation. (arXiv:2112.08060v2 [cs.LG] UPDATED)
    Generative models for images have gained significant attention in computer vision and natural language processing due to their ability to generate realistic samples from complex data distributions. To leverage the advances of image-based generative models for the time series domain, we propose a two-dimensional image representation for time series, the Extended Intertemporal Return Plot (XIRP). Our approach captures the intertemporal time series dynamics in a scale-invariant and invertible way, reducing training time and improving sample quality. We benchmark synthetic XIRPs obtained by an off-the-shelf Wasserstein GAN with gradient penalty (WGAN-GP) to other image representations and models regarding similarity and predictive ability metrics. Our novel, validated image representation for time series consistently and significantly outperforms a state-of-the-art RNN-based generative model regarding predictive ability. Further, we introduce an improved stochastic inversion to substantially improve simulation quality regardless of the representation and provide the prospect of transfer potentials in other domains.
    Dynamical systems' based neural networks. (arXiv:2210.02373v2 [cs.LG] UPDATED)
    Neural networks have gained much interest because of their effectiveness in many applications. However, their mathematical properties are generally not well understood. If there is some underlying geometric structure inherent to the data or to the function to approximate, it is often desirable to take this into account in the design of the neural network. In this work, we start with a non-autonomous ODE and build neural networks using a suitable, structure-preserving, numerical time-discretisation. The structure of the neural network is then inferred from the properties of the ODE vector field. Besides injecting more structure into the network architectures, this modelling procedure allows a better theoretical understanding of their behaviour. We present two universal approximation results and demonstrate how to impose some particular properties on the neural networks. A particular focus is on 1-Lipschitz architectures including layers that are not 1-Lipschitz. These networks are expressive and robust against adversarial attacks, as shown for the CIFAR-10 and CIFAR-100 datasets.
    Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics. (arXiv:2308.16680v1 [stat.ML])
    We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.
    Latent Variable Multi-output Gaussian Processes for Hierarchical Datasets. (arXiv:2308.16822v1 [cs.LG])
    Multi-output Gaussian processes (MOGPs) have been introduced to deal with multiple tasks by exploiting the correlations between different outputs. Generally, MOGPs models assume a flat correlation structure between the outputs. However, such a formulation does not account for more elaborate relationships, for instance, if several replicates were observed for each output (which is a typical setting in biological experiments). This paper proposes an extension of MOGPs for hierarchical datasets (i.e. datasets for which the relationships between observations can be represented within a tree structure). Our model defines a tailored kernel function accounting for hierarchical structures in the data to capture different levels of correlations while leveraging the introduction of latent variables to express the underlying dependencies between outputs through a dedicated kernel. This latter feature is expected to significantly improve scalability as the number of tasks increases. An extensive experimental study involving both synthetic and real-world data from genomics and motion capture is proposed to support our claims.
    Autoencoder-based Online Data Quality Monitoring for the CMS Electromagnetic Calorimeter. (arXiv:2308.16659v1 [physics.ins-det])
    The online Data Quality Monitoring system (DQM) of the CMS electromagnetic calorimeter (ECAL) is a crucial operational tool that allows ECAL experts to quickly identify, localize, and diagnose a broad range of detector issues that would otherwise hinder physics-quality data taking. Although the existing ECAL DQM system has been continuously updated to respond to new problems, it remains one step behind newer and unforeseen issues. Using unsupervised deep learning, a real-time autoencoder-based anomaly detection system is developed that is able to detect ECAL anomalies unseen in past data. After accounting for spatial variations in the response of the ECAL and the temporal evolution of anomalies, the new system is able to efficiently detect anomalies while maintaining an estimated false discovery rate between $10^{-2}$ to $10^{-4}$, beating existing benchmarks by about two orders of magnitude. The real-world performance of the system is validated using anomalies found in 2018 and 2022 LHC collision data. Additionally, first results from deploying the autoencoder-based system in the CMS online DQM workflow for the ECAL barrel during Run 3 of the LHC are presented, showing its promising performance in detecting obscure issues that could have been missed in the existing DQM system.
    Prediction of Diblock Copolymer Morphology via Machine Learning. (arXiv:2308.16886v1 [physics.chem-ph])
    A machine learning approach is presented to accelerate the computation of block polymer morphology evolution for large domains over long timescales. The strategy exploits the separation of characteristic times between coarse-grained particle evolution on the monomer scale and slow morphological evolution over mesoscopic scales. In contrast to empirical continuum models, the proposed approach learns stochastically driven defect annihilation processes directly from particle-based simulations. A UNet architecture that respects different boundary conditions is adopted, thereby allowing periodic and fixed substrate boundary conditions of arbitrary shape. Physical concepts are also introduced via the loss function and symmetries are incorporated via data augmentation. The model is validated using three different use cases. Explainable artificial intelligence methods are applied to visualize the morphology evolution over time. This approach enables the generation of large system sizes and long trajectories to investigate defect densities and their evolution under different types of confinement. As an application, we demonstrate the importance of accessing late-stage morphologies for understanding particle diffusion inside a single block. This work has implications for directed self-assembly and materials design in micro-electronics, battery materials, and membranes.
    Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness. (arXiv:2308.16681v1 [stat.ML])
    A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. When designed well, these systems promise more objective decisions while saving large amounts of resources and freeing up human time. However, when ADM systems are not designed well, they can lead to unfair decisions which discriminate against societal groups. The downstream effects of ADMs critically depend on the decisions made during the systems' design and implementation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these design decisions are made implicitly, without knowing exactly how they will influence the final system. It is therefore important to make explicit the decisions made during the design of ADM systems and understand how these decisions affect the fairness of the resulting system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health coverage of vulnerable populations for potential interventions. Our results illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness and how to detect these effects using multiverse analysis.
    Diffusion Models for Interferometric Satellite Aperture Radar. (arXiv:2308.16847v1 [cs.CV])
    Probabilistic Diffusion Models (PDMs) have recently emerged as a very promising class of generative models, achieving high performance in natural image generation. However, their performance relative to non-natural images, like radar-based satellite data, remains largely unknown. Generating large amounts of synthetic (and especially labelled) satellite data is crucial to implement deep-learning approaches for the processing and analysis of (interferometric) satellite aperture radar data. Here, we leverage PDMs to generate several radar-based satellite image datasets. We show that PDMs succeed in generating images with complex and realistic structures, but that sampling time remains an issue. Indeed, accelerated sampling strategies, which work well on simple image datasets like MNIST, fail on our radar datasets. We provide a simple and versatile open-source https://github.com/thomaskerdreux/PDM_SAR_InSAR_generation to train, sample and evaluate PDMs using any dataset on a single GPU.
    The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants. (arXiv:2308.16884v1 [cs.CL])
    We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.
    Transformers as Support Vector Machines. (arXiv:2308.16898v1 [cs.LG])
    Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwise similarities computed as softmax$(XQK^\top X^\top)$, where $(K,Q)$ are the trainable key-query parameters. In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs. This formalism allows us to characterize the implicit bias of 1-layer transformers optimized with gradient descent: (1) Optimizing the attention layer with vanishing regularization, parameterized by $(K,Q)$, converges in direction to an SVM solution minimizing the nuclear norm of the combined parameter $W=KQ^\top$. Instead, directly parameterizing by $W$ minimizes a Frobenius norm objective. We characterize this convergence, highlighting that it can occur toward locally-optimal directions rather than global ones. (2) Complementing this, we prove the local/global directional convergence of gradient descent under suitable geometric conditions. Importantly, we show that over-parameterization catalyzes global convergence by ensuring the feasibility of the SVM problem and by guaranteeing a benign optimization landscape devoid of stationary points. (3) While our theory applies primarily to linear prediction heads, we propose a more general SVM equivalence that predicts the implicit bias with nonlinear heads. Our findings are applicable to arbitrary datasets and their validity is verified via experiments. We also introduce several open problems and research directions. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
    Federated Learning in UAV-Enhanced Networks: Joint Coverage and Convergence Time Optimization. (arXiv:2308.16889v1 [cs.LG])
    Federated learning (FL) involves several devices that collaboratively train a shared model without transferring their local data. FL reduces the communication overhead, making it a promising learning method in UAV-enhanced wireless networks with scarce energy resources. Despite the potential, implementing FL in UAV-enhanced networks is challenging, as conventional UAV placement methods that maximize coverage increase the FL delay significantly. Moreover, the uncertainty and lack of a priori information about crucial variables, such as channel quality, exacerbate the problem. In this paper, we first analyze the statistical characteristics of a UAV-enhanced wireless sensor network (WSN) with energy harvesting. We then develop a model and solution based on the multi-objective multi-armed bandit theory to maximize the network coverage while minimizing the FL delay. Besides, we propose another solution that is particularly useful with large action sets and strict energy constraints at the UAVs. Our proposal uses a scalarized best-arm identification algorithm to find the optimal arms that maximize the ratio of the expected reward to the expected energy cost by sequentially eliminating one or more arms in each round. Then, we derive the upper bound on the error probability of our multi-objective and cost-aware algorithm. Numerical results show the effectiveness of our approach.
    Efficacy of Neural Prediction-Based NAS for Zero-Shot NAS Paradigm. (arXiv:2308.16775v1 [cs.LG])
    In prediction-based Neural Architecture Search (NAS), performance indicators derived from graph convolutional networks have shown significant success. These indicators, achieved by representing feed-forward structures as component graphs through one-hot encoding, face a limitation: their inability to evaluate architecture performance across varying search spaces. In contrast, handcrafted performance indicators (zero-shot NAS), which use the same architecture with random initialization, can generalize across multiple search spaces. Addressing this limitation, we propose a novel approach for zero-shot NAS using deep learning. Our method employs Fourier sum of sines encoding for convolutional kernels, enabling the construction of a computational feed-forward graph with a structure similar to the architecture under evaluation. These encodings are learnable and offer a comprehensive view of the architecture's topological information. An accompanying multi-layer perceptron (MLP) then ranks these architectures based on their encodings. Experimental results show that our approach surpasses previous methods using graph convolutional networks in terms of correlation on the NAS-Bench-201 dataset and exhibits a higher convergence rate. Moreover, our extracted feature representation trained on each NAS-Benchmark is transferable to other NAS-Benchmarks, showing promising generalizability across multiple search spaces. The code is available at: https://github.com/minh1409/DFT-NPZS-NAS
    Moreau Envelope ADMM for Decentralized Weakly Convex Optimization. (arXiv:2308.16752v1 [math.OC])
    This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. Although the current versions of ADMM algorithm provide promising numerical results in producing solutions that are close to optimal for many convex and non-convex optimization problems, it remains unclear if they can converge to a stationary point for weakly convex and locally non-smooth functions. Through our analysis using the Moreau envelope function, we demonstrate that MADM can indeed converge to a stationary point under mild conditions. Our analysis also includes computing the bounds on the amount of change in the dual variable update step by relating the gradient of the Moreau envelope function to the proximal function. Furthermore, the results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
    StratMed: Relevance Stratification for Low-resource Medication Recommendation. (arXiv:2308.16781v1 [cs.AI])
    With the growing imbalance between limited medical resources and escalating demands, AI-based clinical tasks have become paramount. Medication recommendation, as a sub-domain, aims to amalgamate longitudinal patient history with medical knowledge, assisting physicians in prescribing safer and more accurate medication combinations. Existing methods overlook the inherent long-tail distribution in medical data, lacking balanced representation between head and tail data, which leads to sub-optimal model performance. To address this challenge, we introduce StratMed, a model that incorporates an innovative relevance stratification mechanism. It harmonizes discrepancies in data long-tail distribution and strikes a balance between the safety and accuracy of medication combinations. Specifically, we first construct a pre-training method using deep learning networks to obtain entity representation. After that, we design a pyramid-like data stratification method to obtain more generalized entity relationships by reinforcing the features of unpopular entities. Based on this relationship, we designed two graph structures to express medication precision and safety at the same level to obtain visit representations. Finally, the patient's historical clinical information is fitted to generate medication combinations for the current health condition. Experiments on the MIMIC-III dataset demonstrate that our method has outperformed current state-of-the-art methods in four evaluation metrics (including safety and accuracy).
    Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks. (arXiv:2308.16800v1 [cs.LG])
    Our study reveals new theoretical insights into over-smoothing and feature over-correlation in deep graph neural networks. We show the prevalence of invariant subspaces, demonstrating a fixed relative behavior that is unaffected by feature transformations. Our work clarifies recent observations related to convergence to a constant state and a potential over-separation of node states, as the amplification of subspaces only depends on the spectrum of the aggregation function. In linear scenarios, this leads to node representations being dominated by a low-dimensional subspace with an asymptotic convergence rate independent of the feature transformations. This causes a rank collapse of the node representations, resulting in over-smoothing when smooth vectors span this subspace, and over-correlation even when over-smoothing is avoided. Guided by our theory, we propose a sum of Kronecker products as a beneficial property that can provably prevent over-smoothing, over-correlation, and rank collapse. We empirically extend our insights to the non-linear case, demonstrating the inability of existing models to capture linearly independent features.
    Language-Conditioned Path Planning. (arXiv:2308.16893v1 [cs.RO])
    Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.
    Multi-Objective Decision Transformers for Offline Reinforcement Learning. (arXiv:2308.16379v1 [cs.LG])
    Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling task, where the sole aim is to predict actions based on prior context using the transformer architecture. However, the limitation of this single task learning approach is its potential to undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the art methods.
    Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs. (arXiv:2308.16859v1 [stat.ML])
    In this article, the optimal sample complexity of learning the underlying interaction/dependencies of a Linear Dynamical System (LDS) over a Directed Acyclic Graph (DAG) is studied. The sample complexity of learning a DAG's structure is well-studied for static systems, where the samples of nodal states are independent and identically distributed (i.i.d.). However, such a study is less explored for DAGs with dynamical systems, where the nodal states are temporally correlated. We call such a DAG underlying an LDS as \emph{dynamical} DAG (DDAG). In particular, we consider a DDAG where the nodal dynamics are driven by unobserved exogenous noise sources that are wide-sense stationary (WSS) in time but are mutually uncorrelated, and have the same {power spectral density (PSD)}. Inspired by the static settings, a metric and an algorithm based on the PSD matrix of the observed time series are proposed to reconstruct the DDAG. The equal noise PSD assumption can be relaxed such that identifiability conditions for DDAG reconstruction are not violated. For the LDS with WSS (sub) Gaussian exogenous noise sources, it is shown that the optimal sample complexity (or length of state trajectory) needed to learn the DDAG is $n=\Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. To prove the sample complexity upper bound, a concentration bound for the PSD estimation is derived, under two different sampling strategies. A matching min-max lower bound using generalized Fano's inequality also is provided, thus showing the order optimality of the proposed algorithm.
    Robust Networked Federated Learning for Localization. (arXiv:2308.16737v1 [cs.LG])
    This paper addresses the problem of localization, which is inherently non-convex and non-smooth in a federated setting where the data is distributed across a multitude of devices. Due to the decentralized nature of federated environments, distributed learning becomes essential for scalability and adaptability. Moreover, these environments are often plagued by outlier data, which presents substantial challenges to conventional methods, particularly in maintaining estimation accuracy and ensuring algorithm convergence. To mitigate these challenges, we propose a method that adopts an $L_1$-norm robust formulation within a distributed sub-gradient framework, explicitly designed to handle these obstacles. Our approach addresses the problem in its original form, without resorting to iterative simplifications or approximations, resulting in enhanced computational efficiency and improved estimation accuracy. We demonstrate that our method converges to a stationary point, highlighting its effectiveness and reliability. Through numerical simulations, we confirm the superior performance of our approach, notably in outlier-rich environments, which surpasses existing state-of-the-art localization methods.
    Echocardiographic View Classification with Integrated Out-of-Distribution Detection for Enhanced Automatic Echocardiographic Analysis. (arXiv:2308.16483v1 [eess.SP])
    In the rapidly evolving field of automatic echocardiographic analysis and interpretation, automatic view classification is a critical yet challenging task, owing to the inherent complexity and variability of echocardiographic data. This study presents ECHOcardiography VIew Classification with Out-of-Distribution dEtection (ECHO-VICODE), a novel deep learning-based framework that effectively addresses this challenge by training to classify 31 classes, surpassing previous studies and demonstrating its capacity to handle a wide range of echocardiographic views. Furthermore, ECHO-VICODE incorporates an integrated out-of-distribution (OOD) detection function, leveraging the relative Mahalanobis distance to effectively identify 'near-OOD' instances commonly encountered in echocardiographic data. Through extensive experimentation, we demonstrated the outstanding performance of ECHO-VICODE in terms of view classification and OOD detection, significantly reducing the potential for errors in echocardiographic analyses. This pioneering study significantly advances the domain of automated echocardiography analysis and exhibits promising prospects for substantial applications in extensive clinical research and practice.
    Forecasting Emergency Department Crowding with Advanced Machine Learning Models and Multivariable Input. (arXiv:2308.16544v1 [cs.LG])
    Emergency department (ED) crowding is a significant threat to patient safety and it has been repeatedly associated with increased mortality. Forecasting future service demand has the potential patient outcomes. Despite active research on the subject, several gaps remain: 1) proposed forecasting models have become outdated due to quick influx of advanced machine learning models (ML), 2) amount of multivariable input data has been limited and 3) discrete performance metrics have been rarely reported. In this study, we document the performance of a set of advanced ML models in forecasting ED occupancy 24 hours ahead. We use electronic health record data from a large, combined ED with an extensive set of explanatory variables, including the availability of beds in catchment area hospitals, traffic data from local observation stations, weather variables, etc. We show that N-BEATS and LightGBM outpeform benchmarks with 11 % and 9 % respective improvements and that DeepAR predicts next day crowding with an AUC of 0.76 (95 % CI 0.69-0.84). To the best of our knowledge, this is the first study to document the superiority of LightGBM and N-BEATS over statistical benchmarks in the context of ED forecasting.
    Improving Robustness and Accuracy of Ponzi Scheme Detection on Ethereum Using Time-Dependent Features. (arXiv:2308.16391v1 [cs.CR])
    The rapid development of blockchain has led to more and more funding pouring into the cryptocurrency market, which also attracted cybercriminals' interest in recent years. The Ponzi scheme, an old-fashioned fraud, is now popular on the blockchain, causing considerable financial losses to many crypto-investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code or opcode. The contract-code-based approach, while achieving very high accuracy, is not robust: first, the source codes of a majority of contracts on Ethereum are not available, and second, a Ponzi developer can fool a contract-code-based detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected (since these models were trained on existing Ponzi logics only). A transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. We address this gap in the literature by developing new detection models that rely only on the transactions, hence guaranteeing the robustness, and moreover, achieve considerably higher Accuracy, Precision, Recall, and F1-score than existing transaction-based models. This is made possible thanks to the introduction of novel time-dependent features that capture Ponzi behaviours characteristics derived from our comprehensive data analyses on Ponzi and non-Ponzi data from the XBlock-ETH repository
    Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor Attack. (arXiv:2308.16684v1 [cs.CR])
    The vulnerabilities to backdoor attacks have recently threatened the trustworthiness of machine learning models in practical applications. Conventional wisdom suggests that not everyone can be an attacker since the process of designing the trigger generation algorithm often involves significant effort and extensive experimentation to ensure the attack's stealthiness and effectiveness. Alternatively, this paper shows that there exists a more severe backdoor threat: anyone can exploit an easily-accessible algorithm for silent backdoor attacks. Specifically, this attacker can employ the widely-used lossy image compression from a plethora of compression tools to effortlessly inject a trigger pattern into an image without leaving any noticeable trace; i.e., the generated triggers are natural artifacts. One does not require extensive knowledge to click on the "convert" or "save as" button while using tools for lossy image compression. Via this attack, the adversary does not need to design a trigger generator as seen in prior works and only requires poisoning the data. Empirically, the proposed attack consistently achieves 100% attack success rate in several benchmark datasets such as MNIST, CIFAR-10, GTSRB and CelebA. More significantly, the proposed attack can still achieve almost 100% attack success rate with very small (approximately 10%) poisoning rates in the clean label setting. The generated trigger of the proposed attack using one lossy compression algorithm is also transferable across other related compression algorithms, exacerbating the severity of this backdoor threat. This work takes another crucial step toward understanding the extensive risks of backdoor attacks in practice, urging practitioners to investigate similar attacks and relevant backdoor mitigation methods.
    Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing. (arXiv:2308.16671v1 [cs.LG])
    Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging, as there is no central server to coordinate the training process. Especially when distributed nodes suffer from limitations in communication or computational resources, DFL will experience extremely inefficient and unstable training. Motivated by these challenges, in this paper, we develop a novel algorithm based on the framework of the inexact alternating direction method (iADM). On one hand, our goal is to train a shared model with a sparsity constraint. This constraint enables us to leverage one-bit compressive sensing (1BCS), allowing transmission of one-bit information among neighbour nodes. On the other hand, communication between neighbour nodes occurs only at certain steps, reducing the number of communication rounds. Therefore, the algorithm exhibits notable communication efficiency. Additionally, as each node selects only a subset of neighbours to participate in the training, the algorithm is robust against stragglers. Additionally, complex items are computed only once for several consecutive steps and subproblems are solved inexactly using closed-form solutions, resulting in high computational efficiency. Finally, numerical experiments showcase the algorithm's effectiveness in both communication and computation.
    Majorization-Minimization for sparse SVMs. (arXiv:2308.16858v1 [cs.LG])
    Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization. This choice paves the way to the application of quick training methods built on majorization-minimization approaches, benefiting from the Lipschitz differentiabililty of the loss function. Moreover, the proposed approach allows us to handle sparsity-preserving regularizers promoting the selection of the most significant features, so enhancing the performance. Numerical tests and comparisons conducted on three different datasets demonstrate the good performance of the proposed methodology in terms of qualitative metrics (accuracy, precision, recall, and F 1 score) as well as computational cost.
    Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps. (arXiv:2308.16648v1 [cs.CV])
    Despite recent advancements in image generation, diffusion models still remain largely underexplored in Earth Observation. In this paper we show that state-of-the-art pretrained diffusion models can be conditioned on cartographic data to generate realistic satellite images. We provide two large datasets of paired OpenStreetMap images and satellite views over the region of Mainland Scotland and the Central Belt. We train a ControlNet model and qualitatively evaluate the results, demonstrating that both image quality and map fidelity are possible. Finally, we provide some insights on the opportunities and challenges of applying these models for remote sensing. Our model weights and code for creating the dataset are publicly available at https://github.com/miquel-espinosa/map-sat.
    Calibrated Explanations for Regression. (arXiv:2308.16245v1 [cs.LG])
    Artificial Intelligence (AI) is often an integral part of modern decision support systems (DSSs). The best-performing predictive models used in AI-based DSSs lack transparency. Explainable Artificial Intelligence (XAI) aims to create AI systems that can explain their rationale to human users. Local explanations in XAI can provide information about the causes of individual predictions in terms of feature importance. However, a critical drawback of existing local explanation methods is their inability to quantify the uncertainty associated with a feature's importance. This paper introduces an extension of a feature importance explanation method, Calibrated Explanations (CE), previously only supporting classification, with support for standard regression and probabilistic regression, i.e., the probability that the target is above an arbitrary threshold. The extension for regression keeps all the benefits of CE, such as calibration of the prediction from the underlying model with confidence intervals, uncertainty quantification of feature importance, and allows both factual and counterfactual explanations. CE for standard regression provides fast, reliable, stable, and robust explanations. CE for probabilistic regression provides an entirely new way of creating probabilistic explanations from any ordinary regression model and with a dynamic selection of thresholds. The performance of CE for probabilistic regression regarding stability and speed is comparable to LIME. The method is model agnostic with easily understood conditional rules. An implementation in Python is freely available on GitHub and for installation using pip making the results in this paper easily replicable.
    Training Neural Networks Using Reproducing Kernel Space Interpolation and Model Reduction. (arXiv:2308.16754v1 [math.FA])
    We introduce and study the theory of training neural networks using interpolation techniques from reproducing kernel Hilbert space theory. We generalize the method to Krein spaces, and show that widely-used neural network architectures are subsets of reproducing kernel Krein spaces (RKKS). We study the concept of "associated Hilbert spaces" of RKKS and develop techniques to improve upon the expressivity of various activation functions. Next, using concepts from the theory of functions of several complex variables, we prove a computationally applicable, multidimensional generalization of the celebrated Adamjan- Arov-Krein (AAK) theorem. The theorem yields a novel class of neural networks, called Prolongation Neural Networks (PNN). We demonstrate that, by applying the multidimensional AAK theorem to gain a PNN, one can gain performance superior to both our interpolatory methods and current state-of-the-art methods in noisy environments. We provide useful illustrations of our methods in practice.
    Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study. (arXiv:2308.16585v1 [cs.LG])
    Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve g…
    Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation. (arXiv:2308.16598v1 [eess.IV])
    Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer. Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans. However, since their convolution layers suffer from limited kernel size, they are not able to capture long-range dependencies and global context. To tackle this restriction, vision transformers have been introduced to solve FCNN's locality of receptive fields. Although transformers can capture long-range features, their segmentation performance decreases with various tumor sizes due to the model sensitivity to the input patch size. While finding an optimal patch size improves the performance of vision transformer-based models on segmentation tasks, it is a time-consuming and challenging procedure. This paper proposes a technique to select the vision transformer's optimal input multi-resolution image patch size based on the average volume size of metastasis lesions. We further validated our suggested framework using a transfer-learning technique, demonstrating that the highest Dice similarity coefficient (DSC) performance was obtained by pre-training on training data with a larger tumour volume using the suggested ideal patch size and then training with a smaller one. We experimentally evaluate this idea through pre-training our model on a multi-resolution public dataset. Our model showed consistent and improved results when applied to our private multi-resolution mCRC dataset with a smaller average tumor volume. This study lays the groundwork for optimizing semantic segmentation of small objects using vision transformers. The implementation source code is available at:https://github.com/Ramtin-Mojtahedi/OVTPS.
    MONDEO: Multistage Botnet Detection. (arXiv:2308.16570v1 [cs.CR])
    Mobile devices have widespread to become the most used piece of technology. Due to their characteristics, they have become major targets for botnet-related malware. FluBot is one example of botnet malware that infects mobile devices. In particular, FluBot is a DNS-based botnet that uses Domain Generation Algorithms (DGA) to establish communication with the Command and Control Server (C2). MONDEO is a multistage mechanism with a flexible design to detect DNS-based botnet malware. MONDEO is lightweight and can be deployed without requiring the deployment of software, agents, or configuration in mobile devices, allowing easy integration in core networks. MONDEO comprises four detection stages: Blacklisting/Whitelisting, Query rate analysis, DGA analysis, and Machine learning evaluation. It was created with the goal of processing streams of packets to identify attacks with high efficiency, in the distinct phases. MONDEO was tested against several datasets to measure its efficiency and performance, being able to achieve high performance with RandomForest classifiers. The implementation is available at github.
    Latent Painter. (arXiv:2308.16490v1 [cs.CV])
    Latent diffusers revolutionized the generative AI and inspired creative art. When denoising the latent, the predicted original image at each step collectively animates the formation. However, the animation is limited by the denoising nature of the diffuser, and only renders a sharpening process. This work presents Latent Painter, which uses the latent as the canvas, and the diffuser predictions as the plan, to generate painting animation. Latent Painter also transits one generated image to another, which can happen between images from two different sets of checkpoints.
    Constructing Indoor Region-based Radio Map without Location Labels. (arXiv:2308.16759v1 [cs.LG])
    Radio map construction requires a large amount of radio measurement data with location labels, which imposes a high deployment cost. This paper develops a region-based radio map from received signal strength (RSS) measurements without location labels. The construction is based on a set of blindly collected RSS measurement data from a device that visits each region in an indoor area exactly once, where the footprints and timestamps are not recorded. The main challenge is to cluster the RSS data and match clusters with the physical regions. Classical clustering algorithms fail to work as the RSS data naturally appears as non-clustered due to multipaths and noise. In this paper, a signal subspace model with a sequential prior is constructed for the RSS data, and an integrated segmentation and clustering algorithm is developed, which is shown to find the globally optimal solution in a special case. Furthermore, the clustered data is matched with the physical regions using a graph-based approach. Based on real measurements from an office space, the proposed scheme reduces the region localization error by roughly 50% compared to a weighted centroid localization (WCL) baseline, and it even outperforms some supervised localization schemes, including k-nearest neighbor (KNN), support vector machine (SVM), and deep neural network (DNN), which require labeled data for training.
    Scalable Incomplete Multi-View Clustering with Structure Alignment. (arXiv:2308.16541v1 [cs.LG])
    The success of existing multi-view clustering (MVC) relies on the assumption that all views are complete. However, samples are usually partially available due to data corruption or sensor malfunction, which raises the research of incomplete multi-view clustering (IMVC). Although several anchor-based IMVC methods have been proposed to process the large-scale incomplete data, they still suffer from the following drawbacks: i) Most existing approaches neglect the inter-view discrepancy and enforce cross-view representation to be consistent, which would corrupt the representation capability of the model; ii) Due to the samples disparity between different views, the learned anchor might be misaligned, which we referred as the Anchor-Unaligned Problem for Incomplete data (AUP-ID). Such the AUP-ID would cause inaccurate graph fusion and degrades clustering performance. To tackle these issues, we propose a novel incomplete anchor graph learning framework termed Scalable Incomplete Multi-View Clustering with Structure Alignment (SIMVC-SA). Specially, we construct the view-specific anchor graph to capture the complementary information from different views. In order to solve the AUP-ID, we propose a novel structure alignment module to refine the cross-view anchor correspondence. Meanwhile, the anchor graph construction and alignment are jointly optimized in our unified framework to enhance clustering quality. Through anchor graph construction instead of full graphs, the time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples. Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Our code is publicly available at https://github.com/wy1019/SIMVC-SA.
    CL-MAE: Curriculum-Learned Masked Autoencoders. (arXiv:2308.16572v1 [cs.CV])
    Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches (tokens) in input images, with the masking strategy remaining unchanged during training. In this paper, we propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task. We conjecture that, by gradually increasing the task complexity, the model can learn more sophisticated and transferable representations. To facilitate this, we introduce a novel learnable masking module that possesses the capability to generate masks of different complexities, and integrate the proposed module into masked autoencoders (MAE). Our module is jointly trained with the MAE, while adjusting its behavior during training, transitioning from a partner to the MAE (optimizing the same reconstruction loss) to an adversary (optimizing the opposite loss), while passing through a neutral state. The transition between these behaviors is smooth, being regulated by a factor that is multiplied with the reconstruction loss of the masking module. The resulting training procedure generates an easy-to-hard curriculum. We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE. The empirical results on five downstream tasks confirm our conjecture, demonstrating that curriculum learning can be successfully used to self-supervise masked autoencoders.
    What can we learn from quantum convolutional neural networks?. (arXiv:2308.16664v1 [quant-ph])
    We can learn from analyzing quantum convolutional neural networks (QCNNs) that: 1) working with quantum data can be perceived as embedding physical system parameters through a hidden feature map; 2) their high performance for quantum phase recognition can be attributed to generation of a very suitable basis set during the ground state embedding, where quantum criticality of spin models leads to basis functions with rapidly changing features; 3) pooling layers of QCNNs are responsible for picking those basis functions that can contribute to forming a high-performing decision boundary, and the learning process corresponds to adapting the measurement such that few-qubit operators are mapped to full-register observables; 4) generalization of QCNN models strongly depends on the embedding type, and that rotation-based feature maps with the Fourier basis require careful feature engineering; 5) accuracy and generalization of QCNNs with readout based on a limited number of shots favor the ground state embeddings and associated physics-informed models. We demonstrate these points in simulation, where our results shed light on classification for physical processes, relevant for applications in sensing. Finally, we show that QCNNs with properly chosen ground state embeddings can be used for fluid dynamics problems, expressing shock wave solutions with good generalization and proven trainability.
    Document Layout Analysis on BaDLAD Dataset: A Comprehensive MViTv2 Based Approach. (arXiv:2308.16571v1 [cs.CV])
    In the rapidly evolving digital era, the analysis of document layouts plays a pivotal role in automated information extraction and interpretation. In our work, we have trained MViTv2 transformer model architecture with cascaded mask R-CNN on BaDLAD dataset to extract text box, paragraphs, images and tables from a document. After training on 20365 document images for 36 epochs in a 3 phase cycle, we achieved a training loss of 0.2125 and a mask loss of 0.19. Our work extends beyond training, delving into the exploration of potential enhancement avenues. We investigate the impact of rotation and flip augmentation, the effectiveness of slicing input images pre-inference, the implications of varying the resolution of the transformer backbone, and the potential of employing a dual-pass inference to uncover missed text-boxes. Through these explorations, we observe a spectrum of outcomes, where some modifications result in tangible performance improvements, while others offer unique insights for future endeavors.
    BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge. (arXiv:2308.16458v1 [cs.LG])
    Pre-trained language models like ChatGPT have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks. Moreover, in bioinformatics, generating functional programs poses additional notable challenges due to the amount of domain knowledge, the need for complicated data operations, and intricate functional dependencies between the operations. Here, we present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates 1026 functions and 1243 methods in Python and Java from GitHub and 253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing framework for evaluation, and we have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes the importance of domain knowledge, pragmatic code generation, and contextual understanding. Our dataset, benchmark, Docker images, and scripts required for testing are all available at https://github.com/gersteinlab/biocoder.
    Towards Long-Tailed Recognition for Graph Classification via Collaborative Experts. (arXiv:2308.16609v1 [cs.LG])
    Graph classification, aiming at learning the graph-level representations for effective class assignments, has received outstanding achievements, which heavily relies on high-quality datasets that have balanced class distribution. In fact, most real-world graph data naturally presents a long-tailed form, where the head classes occupy much more samples than the tail classes, it thus is essential to study the graph-level classification over long-tailed data while still remaining largely unexplored. However, most existing long-tailed learning methods in visions fail to jointly optimize the representation learning and classifier training, as well as neglect the mining of the hard-to-classify classes. Directly applying existing methods to graphs may lead to sub-optimal performance, since the model trained on graphs would be more sensitive to the long-tailed distribution due to the complex topological characteristics. Hence, in this paper, we propose a novel long-tailed graph-level classification framework via Collaborative Multi-expert Learning (CoMe) to tackle the problem. To equilibrate the contributions of head and tail classes, we first develop balanced contrastive learning from the view of representation learning, and then design an individual-expert classifier training based on hard class mining. In addition, we execute gated fusion and disentangled knowledge distillation among the multiple experts to promote the collaboration in a multi-expert framework. Comprehensive experiments are performed on seven widely-used benchmark datasets to demonstrate the superiority of our method CoMe over state-of-the-art baselines.
    Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis. (arXiv:2308.16593v1 [cs.SD])
    The spontaneous behavior that often occurs in conversations makes speech more human-like compared to reading-style. However, synthesizing spontaneous-style speech is challenging due to the lack of high-quality spontaneous datasets and the high cost of labeling spontaneous behavior. In this paper, we propose a semi-supervised pre-training method to increase the amount of spontaneous-style speech and spontaneous behavioral labels. In the process of semi-supervised learning, both text and speech information are considered for detecting spontaneous behaviors labels in speech. Moreover, a linguistic-aware encoder is used to model the relationship between each sentence in the conversation. Experimental results indicate that our proposed method achieves superior expressive speech synthesis performance with the ability to model spontaneous behavior in spontaneous-style speech and predict reasonable spontaneous behavior from text.
    SA6D: Self-Adaptive Few-Shot 6D Pose Estimator for Novel and Occluded Objects. (arXiv:2308.16528v1 [cs.CV])
    To enable meaningful robotic manipulation of objects in the real-world, 6D pose estimation is one of the critical aspects. Most existing approaches have difficulties to extend predictions to scenarios where novel object instances are continuously introduced, especially with heavy occlusions. In this work, we propose a few-shot pose estimation (FSPE) approach called SA6D, which uses a self-adaptive segmentation module to identify the novel target object and construct a point cloud model of the target object using only a small number of cluttered reference images. Unlike existing methods, SA6D does not require object-centric reference images or any additional object information, making it a more generalizable and scalable solution across categories. We evaluate SA6D on real-world tabletop object datasets and demonstrate that SA6D outperforms existing FSPE methods, particularly in cluttered scenes with occlusions, while requiring fewer reference images.
    Transformer-based interpretable multi-modal data fusion for skin lesion classification. (arXiv:2304.14505v2 [eess.IV] UPDATED)
    A lot of deep learning (DL) research these days is mainly focused on improving quantitative metrics regardless of other factors. In human-centered applications, like skin lesion classification in dermatology, DL-driven clinical decision support systems are still in their infancy due to the limited transparency of their decision-making process. Moreover, the lack of procedures that can explain the behavior of trained DL algorithms leads to almost no trust from clinical physicians. To diagnose skin lesions, dermatologists rely on visual assessment of the disease and the data gathered from the patient's anamnesis. Data-driven algorithms dealing with multi-modal data are limited by the separation of feature-level and decision-level fusion procedures required by convolutional architectures. To address this issue, we enable single-stage multi-modal data fusion via the attention mechanism of transformer-based architectures to aid in diagnosing skin diseases. Our method beats other state-of-the-art single- and multi-modal DL architectures in image-rich and patient-data-rich environments. Additionally, the choice of the architecture enables native interpretability support for the classification task both in the image and metadata domain with no additional modifications necessary.
    On a Connection between Differential Games, Optimal Control, and Energy-based Models for Multi-Agent Interactions. (arXiv:2308.16539v1 [cs.RO])
    Game theory offers an interpretable mathematical framework for modeling multi-agent interactions. However, its applicability in real-world robotics applications is hindered by several challenges, such as unknown agents' preferences and goals. To address these challenges, we show a connection between differential games, optimal control, and energy-based models and demonstrate how existing approaches can be unified under our proposed Energy-based Potential Game formulation. Building upon this formulation, this work introduces a new end-to-end learning application that combines neural networks for game-parameter inference with a differentiable game-theoretic optimization layer, acting as an inductive bias. The experiments using simulated mobile robot pedestrian interactions and real-world automated driving data provide empirical evidence that the game-theoretic layer improves the predictive performance of various neural network backbones.
    Domain-adaptive Message Passing Graph Neural Network. (arXiv:2308.16470v1 [cs.LG])
    Cross-network node classification (CNNC), which aims to classify nodes in a label-deficient target network by transferring the knowledge from a source network with abundant labels, draws increasing attention recently. To address CNNC, we propose a domain-adaptive message passing graph neural network (DM-GNN), which integrates graph neural network (GNN) with conditional adversarial domain adaptation. DM-GNN is capable of learning informative representations for node classification that are also transferrable across networks. Firstly, a GNN encoder is constructed by dual feature extractors to separate ego-embedding learning from neighbor-embedding learning so as to jointly capture commonality and discrimination between connected nodes. Secondly, a label propagation node classifier is proposed to refine each node's label prediction by combining its own prediction and its neighbors' prediction. In addition, a label-aware propagation scheme is devised for the labeled source network to promote intra-class propagation while avoiding inter-class propagation, thus yielding label-discriminative source embeddings. Thirdly, conditional adversarial domain adaptation is performed to take the neighborhood-refined class-label information into account during adversarial domain adaptation, so that the class-conditional distributions across networks can be better matched. Comparisons with eleven state-of-the-art methods demonstrate the effectiveness of the proposed DM-GNN.
    Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning. (arXiv:2308.16481v1 [cs.CV])
    We present Point-TTA, a novel test-time adaptation framework for point cloud registration (PCR) that improves the generalization and the performance of registration models. While learning-based approaches have achieved impressive progress, generalization to unknown testing environments remains a major challenge due to the variations in 3D scans. Existing methods typically train a generic model and the same trained model is applied on each instance during testing. This could be sub-optimal since it is difficult for the same model to handle all the variations during testing. In this paper, we propose a test-time adaptation approach for PCR. Our model can adapt to unseen distributions at test-time without requiring any prior knowledge of the test data. Concretely, we design three self-supervised auxiliary tasks that are optimized jointly with the primary PCR task. Given a test instance, we adapt our model using these auxiliary tasks and the updated model is used to perform the inference. During training, our model is trained using a meta-auxiliary learning approach, such that the adapted model via auxiliary tasks improves the accuracy of the primary task. Experimental results demonstrate the effectiveness of our approach in improving generalization of point cloud registration and outperforming other state-of-the-art approaches.
    Test-Time Adaptation for Point Cloud Upsampling Using Meta-Learning. (arXiv:2308.16484v1 [cs.CV])
    Affordable 3D scanners often produce sparse and non-uniform point clouds that negatively impact downstream applications in robotic systems. While existing point cloud upsampling architectures have demonstrated promising results on standard benchmarks, they tend to experience significant performance drops when the test data have different distributions from the training data. To address this issue, this paper proposes a test-time adaption approach to enhance model generality of point cloud upsampling. The proposed approach leverages meta-learning to explicitly learn network parameters for test-time adaption. Our method does not require any prior information about the test data. During meta-training, the model parameters are learned from a collection of instance-level tasks, each of which consists of a sparse-dense pair of point clouds from the training data. During meta-testing, the trained model is fine-tuned with a few gradient updates to produce a unique set of network parameters for each test instance. The updated model is then used for the final prediction. Our framework is generic and can be applied in a plug-and-play manner with existing backbone networks in point cloud upsampling. Extensive experiments demonstrate that our approach improves the performance of state-of-the-art models.
    BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks. (arXiv:2308.16385v1 [cs.LG])
    To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient comparison. Overall, there lacks an empirical study that puts TGNN models onto the same ground and compares them comprehensively. To this end, we propose BenchTemp, a general benchmark for evaluating TGNN models on various workloads. BenchTemp provides a set of benchmark datasets so that different TGNN models can be fairly compared. Further, BenchTemp engineers a standard pipeline that unifies the TGNN evaluation. With BenchTemp, we extensively compare the representative TGNN models on different tasks (e.g., link prediction and node classification) and settings (transductive and inductive), w.r.t. both effectiveness and efficiency metrics. We have made BenchTemp publicly available at https://github.com/qianghuangwhu/benchtemp.
    In-class Data Analysis Replications: Teaching Students while Testing Science. (arXiv:2308.16491v1 [cs.CY])
    Science is facing a reproducibility crisis. Previous work has proposed incorporating data analysis replications into classrooms as a potential solution. However, despite the potential benefits, it is unclear whether this approach is feasible, and if so, what the involved stakeholders-students, educators, and scientists-should expect from it. Can students perform a data analysis replication over the course of a class? What are the costs and benefits for educators? And how can this solution help benchmark and improve the state of science? In the present study, we incorporated data analysis replications in the project component of the Applied Data Analysis course (CS-401) taught at EPFL (N=354 students). Here we report pre-registered findings based on surveys administered throughout the course. First, we demonstrate that students can replicate previously published scientific papers, most of them qualitatively and some exactly. We find discrepancies between what students expect of data analysis replications and what they experience by doing them along with changes in expectations about reproducibility, which together serve as evidence of attitude shifts to foster students' critical thinking. Second, we provide information for educators about how much overhead is needed to incorporate replications into the classroom and identify concerns that replications bring as compared to more traditional assignments. Third, we identify tangible benefits of the in-class data analysis replications for scientific communities, such as a collection of replication reports and insights about replication barriers in scientific work that should be avoided going forward. Overall, we demonstrate that incorporating replication tasks into a large data science class can increase the reproducibility of scientific work as a by-product of data science instruction, thus benefiting both science and students.
    Backpropagation through Back Substitution with a Backslash. (arXiv:2303.15449v2 [math.NA] UPDATED)
    We present a linear algebra formulation of backpropagation which allows the calculation of gradients by using a generically written ``backslash'' or Gaussian elimination on triangular systems of equations. Generally, the matrix elements are operators. This paper has three contributions: (i) it is of intellectual value to replace traditional treatments of automatic differentiation with a (left acting) operator theoretic, graph-based approach; (ii) operators can be readily placed in matrices in software in programming languages such as Julia as an implementation option; (iii) we introduce a novel notation, ``transpose dot'' operator ``$\{\}^{T_\bullet}$'' that allows for the reversal of operators. We further demonstrate the elegance of the operators approach in a suitable programming language consisting of generic linear algebra operators such as Julia \cite{bezanson2017julia}, and that it is possible to realize this abstraction in code. Our implementation shows how generic linear algebra can allow operators as elements of matrices. In contrast to ``operator overloading,'' where backslash would normally have to be rewritten to take advantage of operators, with ``generic programming'' there is no such need.
    Knowledge Enhanced Graph Neural Networks for Graph Completion. (arXiv:2303.15487v3 [cs.AI] UPDATED)
    Graph data is omnipresent and has a wide variety of applications, such as in natural science, social networks, or the semantic web. However, while being rich in information, graphs are often noisy and incomplete. As a result, graph completion tasks, such as node classification or link prediction, have gained attention. On one hand, neural methods, such as graph neural networks, have proven to be robust tools for learning rich representations of noisy graphs. On the other hand, symbolic methods enable exact reasoning on graphs.We propose Knowledge Enhanced Graph Neural Networks (KeGNN), a neuro-symbolic framework for graph completion that combines both paradigms as it allows for the integration of prior knowledge into a graph neural network model.Essentially, KeGNN consists of a graph neural network as a base upon which knowledge enhancement layers are stacked with the goal of refining predictions with respect to prior knowledge.We instantiate KeGNN in conjunction with two state-of-the-art graph neural networks, Graph Convolutional Networks and Graph Attention Networks, and evaluate KeGNN on multiple benchmark datasets for node classification.
    Invertible normalizing flow neural networks by JKO scheme. (arXiv:2212.14424v2 [stat.ML] UPDATED)
    Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks without sampling SDE trajectories or inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one by one, reducing the memory load and difficulty in performing end-to-end deep flow network training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with the existing flow and diffusion models at a significantly reduced computational and memory cost.
    Neuronal diversity can improve machine learning for physics and beyond. (arXiv:2204.04348v3 [cs.LG] UPDATED)
    Diversity conveys advantages in nature, yet homogeneous neurons typically comprise the layers of artificial neural networks. Here we construct neural networks from neurons that learn their own activation functions, quickly diversify, and subsequently outperform their homogeneous counterparts on image classification and nonlinear regression tasks. Sub-networks instantiate the neurons, which meta-learn especially efficient sets of nonlinear responses. Examples include conventional neural networks classifying digits and forecasting a van der Pol oscillator and physics-informed Hamiltonian neural networks learning H\'enon-Heiles stellar orbits and the swing of a video recorded pendulum clock. Such \textit{learned diversity} provides examples of dynamical systems selecting diversity over uniformity and elucidates the role of diversity in natural and artificial systems.
    System identification of neural systems: If we got it right, would we know?. (arXiv:2302.06677v2 [q-bio.NC] UPDATED)
    Artificial neural networks are being proposed as models of parts of the brain. The networks are compared to recordings of biological neurons, and good performance in reproducing neural responses is considered to support the model's validity. A key question is how much this system identification approach tells us about brain computation. Does it validate one model architecture over another? We evaluate the most commonly used comparison techniques, such as a linear encoding model and centered kernel alignment, to correctly identify a model by replacing brain recordings with known ground truth models. System identification performance is quite variable; it also depends significantly on factors independent of the ground truth architecture, such as stimuli images. In addition, we show the limitations of using functional similarity scores in identifying higher-level architectural motifs.
    Balancing between the Local and Global Structures (LGS) in Graph Embedding. (arXiv:2308.16403v1 [cs.HC])
    We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing live. The choice of using a local or a global embedding for visualization depends not only on the task but also on the structure of the underlying data, which may not be known in advance. For a given graph, LGS aims to find a good balance between the local and global structure to preserve. We evaluate the performance of LGS with synthetic and real-world datasets and our results indicate that it is competitive with the state-of-the-art methods, using established quality metrics such as stress and neighborhood preservation. We introduce a novel quality metric, cluster distance preservation, to assess intermediate structure capture. All source-code, datasets, experiments and analysis are available online.
    Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training. (arXiv:2308.16453v1 [cs.CR])
    Mobile Internet has profoundly reshaped modern lifestyles in various aspects. Encrypted Traffic Classification (ETC) naturally plays a crucial role in managing mobile Internet, especially with the explosive growth of mobile apps using encrypted communication. Despite some existing learning-based ETC methods showing promising results, three-fold limitations still remain in real-world network environments, 1) label bias caused by traffic class imbalance, 2) traffic homogeneity caused by component sharing, and 3) training with reliance on sufficient labeled traffic. None of the existing ETC methods can address all these limitations. In this paper, we propose a novel Pre-trAining Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the original train dataset and perform contrastive pre-training without using individual app labels directly to avoid label bias issues caused by class imbalance, while obtaining a robust feature representation to differentiate overlapping homogeneous traffic by pulling positive traffic pairs closer and pushing negative pairs away. Meanwhile, PASS designs a semi-supervised optimization strategy based on pseudo-label iteration and dynamic loss weighting algorithms in order to effectively utilize massive unlabeled traffic data and alleviate manual train dataset annotation workload. PASS outperforms state-of-the-art ETC methods and generic sampling approaches on four public datasets with significant class imbalance and traffic homogeneity, remarkably pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%. Furthermore, we validate the generality of the contrastive pre-training and pseudo-label iteration components of PASS, which can adaptively benefit ETC methods with diverse feature extractors.
    Least Squares Maximum and Weighted Generalization-Memorization Machines. (arXiv:2308.16456v1 [stat.ML])
    In this paper, we propose a new way of remembering by introducing a memory influence mechanism for the least squares support vector machine (LSSVM). Without changing the equation constraints of the original LSSVM, this mechanism, allows an accurate partitioning of the training set without overfitting. The maximum memory impact model (MIMM) and the weighted impact memory model (WIMM) are then proposed. It is demonstrated that these models can be degraded to the LSSVM. Furthermore, we propose some different memory impact functions for the MIMM and WIMM. The experimental results show that that our MIMM and WIMM have better generalization performance compared to the LSSVM and significant advantage in time cost compared to other memory models.
    Computing excited states of molecules using normalizing flows. (arXiv:2308.16468v1 [physics.chem-ph])
    We present a new nonlinear variational framework for simultaneously computing ground and excited states of quantum systems. Our approach is based on approximating wavefunctions in the linear span of basis functions that are augmented and optimized \emph{via} composition with normalizing flows. The accuracy and efficiency of our approach are demonstrated in the calculations of a large number of vibrational states of the triatomic H$_2$S molecule as well as ground and several excited electronic states of prototypical one-electron systems including the hydrogen atom, the molecular hydrogen ion, and a carbon atom in a single-active-electron approximation. The results demonstrate significant improvements in the accuracy of energy predictions and accelerated basis-set convergence even when using normalizing flows with a small number of parameters. The present approach can be also seen as the optimization of a set of intrinsic coordinates that best capture the underlying physics within the given basis set.
    Materials Informatics Transformer: A Language Model for Interpretable Materials Properties Prediction. (arXiv:2308.16259v1 [cs.LG])
    Recently, the remarkable capabilities of large language models (LLMs) have been illustrated across a variety of research domains such as natural language processing, computer vision, and molecular modeling. We extend this paradigm by utilizing LLMs for material property prediction by introducing our model Materials Informatics Transformer (MatInFormer). Specifically, we introduce a novel approach that involves learning the grammar of crystallography through the tokenization of pertinent space group information. We further illustrate the adaptability of MatInFormer by incorporating task-specific data pertaining to Metal-Organic Frameworks (MOFs). Through attention visualization, we uncover the key features that the model prioritizes during property prediction. The effectiveness of our proposed model is empirically validated across 14 distinct datasets, hereby underscoring its potential for high throughput screening through accurate material property prediction.
    On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint. (arXiv:2308.16425v1 [cs.LG])
    Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.
    ToddlerBERTa: Exploiting BabyBERTa for Grammar Learning and Language Understanding. (arXiv:2308.16336v1 [cs.CL])
    We present ToddlerBERTa, a BabyBERTa-like language model, exploring its capabilities through five different models with varied hyperparameters. Evaluating on BLiMP, SuperGLUE, MSGS, and a Supplement benchmark from the BabyLM challenge, we find that smaller models can excel in specific tasks, while larger models perform well with substantial data. Despite training on a smaller dataset, ToddlerBERTa demonstrates commendable performance, rivalling the state-of-the-art RoBERTa-base. The model showcases robust language understanding, even with single-sentence pretraining, and competes with baselines that leverage broader contextual information. Our work provides insights into hyperparameter choices, and data utilization, contributing to the advancement of language models.
    A Unified Analysis for the Subgradient Methods Minimizing Composite Nonconvex, Nonsmooth and Non-Lipschitz Functions. (arXiv:2308.16362v1 [math.OC])
    In this paper we propose a proximal subgradient method (Prox-SubGrad) for solving nonconvex and nonsmooth optimization problems without assuming Lipschitz continuity conditions. A number of subgradient upper bounds and their relationships are presented. By means of these upper bounding conditions, we establish some uniform recursive relations for the Moreau envelopes for weakly convex optimization. This uniform scheme simplifies and unifies the proof schemes to establish rate of convergence for Prox-SubGrad without assuming Lipschitz continuity. We present a novel convergence analysis in this context. Furthermore, we propose some new stochastic subgradient upper bounding conditions and establish convergence and iteration complexity rates for the stochastic subgradient method (Sto-SubGrad) to solve non-Lipschitz and nonsmooth stochastic optimization problems. In particular, for both deterministic and stochastic subgradient methods on weakly convex optimization problems without Lipschitz continuity, under any of the subgradient upper bounding conditions to be introduced in the paper, we show that $O(1/\sqrt{T})$ convergence rate holds in terms of the square of gradient of the Moreau envelope function, which further improves to be $O(1/{T})$ if, in addition, the uniform KL condition with exponent $1/2$ holds.
    Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff. (arXiv:2308.16454v1 [cs.CV])
    This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is standardly pretrained on clean examples. RGKD and NR respectively entail a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the standardly pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST mitigates the tradeoff more effectively than previous AT-based methods do.
    Ten Years of Generative Adversarial Nets (GANs): A survey of the state-of-the-art. (arXiv:2308.16316v1 [cs.LG])
    Since their inception in 2014, Generative Adversarial Networks (GANs) have rapidly emerged as powerful tools for generating realistic and diverse data across various domains, including computer vision and other applied areas. Consisting of a discriminative network and a generative network engaged in a Minimax game, GANs have revolutionized the field of generative modeling. In February 2018, GAN secured the leading spot on the ``Top Ten Global Breakthrough Technologies List'' issued by the Massachusetts Science and Technology Review. Over the years, numerous advancements have been proposed, leading to a rich array of GAN variants, such as conditional GAN, Wasserstein GAN, CycleGAN, and StyleGAN, among many others. This survey aims to provide a general overview of GANs, summarizing the latent architecture, validation metrics, and application areas of the most widely recognized variants. We also delve into recent theoretical developments, exploring the profound connection between the adversarial principle underlying GAN and Jensen-Shannon divergence, while discussing the optimality characteristics of the GAN framework. The efficiency of GAN variants and their model architectures will be evaluated along with training obstacles as well as training solutions. In addition, a detailed discussion will be provided, examining the integration of GANs with newly developed deep learning frameworks such as Transformers, Physics-Informed Neural Networks, Large Language models, and Diffusion models. Finally, we reveal several issues as well as future research outlines in this field.
    AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction. (arXiv:2308.16437v1 [cs.IR])
    Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home.
    Emergence of Segmentation with Minimalistic White-Box Transformers. (arXiv:2308.16271v1 [cs.CV])
    Transformer-like models for vision tasks have recently proven effective for a wide range of downstream applications such as segmentation and detection. Previous works have shown that segmentation properties emerge in vision transformers (ViTs) trained using self-supervised methods such as DINO, but not in those trained on supervised classification tasks. In this study, we probe whether segmentation emerges in transformer-based models solely as a result of intricate self-supervised learning mechanisms, or if the same emergence can be achieved under much broader conditions through proper design of the model architecture. Through extensive experimental results, we demonstrate that when employing a white-box transformer-like architecture known as CRATE, whose design explicitly models and pursues low-dimensional structures in the data distribution, segmentation properties, at both the whole and parts levels, already emerge with a minimalistic supervised training recipe. Layer-wise finer-grained analysis reveals that the emergent properties strongly corroborate the designed mathematical functions of the white-box network. Our results suggest a path to design white-box foundation models that are simultaneously highly performant and mathematically fully interpretable. Code is at \url{https://github.com/Ma-Lab-Berkeley/CRATE}.
    MASA-TCN: Multi-anchor Space-aware Temporal Convolutional Neural Networks for Continuous and Discrete EEG Emotion Recognition. (arXiv:2308.16207v1 [cs.LG])
    Emotion recognition using electroencephalogram (EEG) mainly has two scenarios: classification of the discrete labels and regression of the continuously tagged labels. Although many algorithms were proposed for classification tasks, there are only a few methods for regression tasks. For emotion regression, the label is continuous in time. A natural method is to learn the temporal dynamic patterns. In previous studies, long short-term memory (LSTM) and temporal convolutional neural networks (TCN) were utilized to learn the temporal contextual information from feature vectors of EEG. However, the spatial patterns of EEG were not effectively extracted. To enable the spatial learning ability of TCN towards better regression and classification performances, we propose a novel unified model, named MASA-TCN, for EEG emotion regression and classification tasks. The space-aware temporal layer enables TCN to additionally learn from spatial relations among EEG electrodes. Besides, a novel multi-anchor block with attentive fusion is proposed to learn dynamic temporal dependencies. Experiments on two publicly available datasets show MASA-TCN achieves higher results than the state-of-the-art methods for both EEG emotion regression and classification tasks. The code is available at https://github.com/yi-ding-cs/MASA-TCN.
    A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications. (arXiv:2308.16375v1 [cs.LG])
    Graph Neural Networks (GNNs) have gained significant attention owing to their ability to handle graph-structured data and the improvement in practical applications. However, many of these models prioritize high utility performance, such as accuracy, with a lack of privacy consideration, which is a major concern in modern society where privacy attacks are rampant. To address this issue, researchers have started to develop privacy-preserving GNNs. Despite this progress, there is a lack of a comprehensive overview of the attacks and the techniques for preserving privacy in the graph domain. In this survey, we aim to address this gap by summarizing the attacks on graph data according to the targeted information, categorizing the privacy preservation techniques in GNNs, and reviewing the datasets and applications that could be used for analyzing/solving privacy issues in GNNs. We also outline potential directions for future research in order to build better privacy-preserving GNNs.
    GRASP: A Goodness-of-Fit Test for Classification Learning. (arXiv:2209.02064v2 [stat.ME] UPDATED)
    Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterizing the fit of the model to the underlying conditional law of labels given the features vector ($Y|X$), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law $Y|X$, and treats that as a black box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form \[ H_0: \mathbb{E}\Big[D_f\Big({\sf Bern}(\eta(X))\|{\sf Bern}(\hat{\eta}(X))\Big)\Big]\leq \tau\,, \] where $D_f$ represents an $f$-divergence function, and $\eta(x)$, $\hat{\eta}(x)$ respectively denote the true and an estimate likelihood for a feature vector $x$ admitting a positive label. We propose a novel test, called \grasp for testing $H_0$, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X \grasp designed for model-X settings where the joint distribution of the features vector is known. Model-X \grasp uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.
    Learning Collaborative Information Dissemination with Graph-based Multi-Agent Reinforcement Learning. (arXiv:2308.16198v1 [cs.LG])
    In modern communication systems, efficient and reliable information dissemination is crucial for supporting critical operations across domains like disaster response, autonomous vehicles, and sensor networks. This paper introduces a Multi-Agent Reinforcement Learning (MARL) approach as a significant step forward in achieving more decentralized, efficient, and collaborative solutions. We propose a Decentralized-POMDP formulation for information dissemination, empowering each agent to independently decide on message forwarding. This constitutes a significant paradigm shift from traditional heuristics based on Multi-Point Relay (MPR) selection. Our approach harnesses Graph Convolutional Reinforcement Learning, employing Graph Attention Networks (GAT) with dynamic attention to capture essential network features. We propose two approaches, L-DGN and HL-DGN, which differ in the information that is exchanged among agents. We evaluate the performance of our decentralized approaches, by comparing them with a widely-used MPR heuristic, and we show that our trained policies are able to efficiently cover the network while bypassing the MPR set selection process. Our approach promises a first step toward bolstering the resilience of real-world broadcast communication infrastructures via learned, collaborative information dissemination.
    RetroBridge: Modeling Retrosynthesis with Markov Bridges. (arXiv:2308.16212v1 [q-bio.QM])
    Retrosynthesis planning is a fundamental challenge in chemistry which aims at designing reaction pathways from commercially available starting materials to a target molecule. Each step in multi-step retrosynthesis planning requires accurate prediction of possible precursor molecules given the target molecule and confidence estimates to guide heuristic search algorithms. We model single-step retrosynthesis planning as a distribution learning problem in a discrete state space. First, we introduce the Markov Bridge Model, a generative framework aimed to approximate the dependency between two intractable discrete distributions accessible via a finite sample of coupled data points. Our framework is based on the concept of a Markov bridge, a Markov process pinned at its endpoints. Unlike diffusion-based methods, our Markov Bridge Model does not need a tractable noise distribution as a sampling proxy and directly operates on the input product molecules as samples from the intractable prior distribution. We then address the retrosynthesis planning problem with our novel framework and introduce RetroBridge, a template-free retrosynthesis modeling approach that achieves state-of-the-art results on standard evaluation benchmarks.
    Deep Video Codec Control. (arXiv:2308.16215v1 [eess.IV])
    Lossy video compression is commonly used when transmitting and storing video data. Unified video codecs (e.g., H.264 or H.265) remain the \emph{de facto} standard, despite the availability of advanced (neural) compression approaches. Transmitting videos in the face of dynamic network bandwidth conditions requires video codecs to adapt to vastly different compression strengths. Rate control modules augment the codec's compression such that bandwidth constraints are satisfied and video distortion is minimized. While, both standard video codes and their rate control modules are developed to minimize video distortion w.r.t. human quality assessment, preserving the downstream performance of deep vision models is not considered. In this paper, we present the first end-to-end learnable deep video codec control considering both bandwidth constraints and downstream vision performance, while not breaking existing standardization. We demonstrate for two common vision tasks (semantic segmentation and optical flow estimation) and on two different datasets that our deep codec control better preserves downstream performance than using 2-pass average bit rate control while meeting dynamic bandwidth constraints and adhering to standardizations.
    A numerical approach for the fractional Laplacian via deep neural networks. (arXiv:2308.16272v1 [math.AP])
    We consider the fractional elliptic problem with Dirichlet boundary conditions on a bounded and convex domain $D$ of $\mathbb{R}^d$, with $d \geq 2$. In this paper, we perform a stochastic gradient descent algorithm that approximates the solution of the fractional problem via Deep Neural Networks. Additionally, we provide four numerical examples to test the efficiency of the algorithm, and each example will be studied for many values of $\alpha \in (1,2)$ and $d \geq 2$.
    Transformers Meet Directed Graphs. (arXiv:2302.00049v3 [cs.LG] UPDATED)
    Transformers were originally proposed as a sequence-to-sequence model for text but have become vital for a wide range of modalities, including images, audio, video, and undirected graphs. However, transformers for directed graphs are a surprisingly underexplored topic, despite their applicability to ubiquitous domains, including source code and logic circuits. In this work, we propose two direction- and structure-aware positional encodings for directed graphs: (1) the eigenvectors of the Magnetic Laplacian - a direction-aware generalization of the combinatorial Laplacian; (2) directional random walk encodings. Empirically, we show that the extra directionality information is useful in various downstream tasks, including correctness testing of sorting networks and source code understanding. Together with a data-flow-centric graph construction, our model outperforms the prior state of the art on the Open Graph Benchmark Code2 relatively by 14.7%.
    A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems. (arXiv:2308.16904v1 [math.NA])
    Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and considers measurement noise in the right-hand side vector, $b$. Unfortunately, in practice, that is not always the case; the coefficient matrix $A$ can also be noisy. In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$. In our analyses, the quantity $\tilde R=\| \tilde A^{\dagger} \|_2^2 \|\tilde A \|_F^2$ influences the convergence of RK, where $\tilde A$ represents a noisy version of $A$. We claim that our analysis is robust and realistically applicable, as we do not require information about the noiseless coefficient matrix, $A$, and considering different conditions on noise, we can control the convergence of RK. We substantiate our theoretical findings by performing comprehensive numerical experiments.
    MGNN: Graph Neural Networks Inspired by Distance Geometry Problem. (arXiv:2201.12994v4 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have emerged as a prominent research topic in the field of machine learning. Existing GNN models are commonly categorized into two types: spectral GNNs, which are designed based on polynomial graph filters, and spatial GNNs, which utilize a message-passing scheme as the foundation of the model. For the expressive power and universality of spectral GNNs, a natural approach is to improve the design of basis functions for better approximation ability. As for spatial GNNs, models like Graph Isomorphism Networks (GIN) analyze their expressive power based on Graph Isomorphism Tests. Recently, there have been attempts to establish connections between spatial GNNs and geometric concepts like curvature and cellular sheaves, as well as physical phenomena like oscillators. However, despite the recent progress, there is still a lack of comprehensive analysis regarding the universality of spatial GNNs from the perspectives of geometry and physics. In this paper, we propose MetricGNN (MGNN), a spatial GNN model inspired by the congruent-insensitivity property of classifiers in the classification phase of GNNs. We demonstrate that a GNN model is universal in the spatial domain if it can generate embedding matrices that are congruent to any given embedding matrix. This property is closely related to the Distance Geometry Problem (DGP). Since DGP is an NP-Hard combinatorial optimization problem, we propose optimizing an energy function derived from spring networks and the Multi-Dimensional Scaling (MDS) problem. This approach also allows our model to handle both homophilic and heterophilic graphs. Finally, we propose employing the iteration method to optimize our energy function. We extensively evaluate the effectiveness of our model through experiments conducted on both synthetic and real-world datasets. Our code is available at: https://github.com/GuanyuCui/MGNN.
    Joint Semantic-Native Communication and Inference via Minimal Simplicial Structures. (arXiv:2308.16789v1 [eess.SP])
    In this work, we study the problem of semantic communication and inference, in which a student agent (i.e. mobile device) queries a teacher agent (i.e. cloud sever) to generate higher-order data semantics living in a simplicial complex. Specifically, the teacher first maps its data into a k-order simplicial complex and learns its high-order correlations. For effective communication and inference, the teacher seeks minimally sufficient and invariant semantic structures prior to conveying information. These minimal simplicial structures are found via judiciously removing simplices selected by the Hodge Laplacians without compromising the inference query accuracy. Subsequently, the student locally runs its own set of queries based on a masked simplicial convolutional autoencoder (SCAE) leveraging both local and remote teacher's knowledge. Numerical results corroborate the effectiveness of the proposed approach in terms of improving inference query accuracy under different channel conditions and simplicial structures. Experiments on a coauthorship dataset show that removing simplices by ranking the Laplacian values yields a 85% reduction in payload size without sacrificing accuracy. Joint semantic communication and inference by masked SCAE improves query accuracy by 25% compared to local student based query and 15% compared to remote teacher based query. Finally, incorporating channel semantics is shown to effectively improve inference accuracy, notably at low SNR values.
    Irregular Traffic Time Series Forecasting Based on Asynchronous Spatio-Temporal Graph Convolutional Network. (arXiv:2308.16818v1 [cs.LG])
    Accurate traffic forecasting at intersections governed by intelligent traffic signals is critical for the advancement of an effective intelligent traffic signal control system. However, due to the irregular traffic time series produced by intelligent intersections, the traffic forecasting task becomes much more intractable and imposes three major new challenges: 1) asynchronous spatial dependency, 2) irregular temporal dependency among traffic data, and 3) variable-length sequence to be predicted, which severely impede the performance of current traffic forecasting methods. To this end, we propose an Asynchronous Spatio-tEmporal graph convolutional nEtwoRk (ASeer) to predict the traffic states of the lanes entering intelligent intersections in a future time window. Specifically, by linking lanes via a traffic diffusion graph, we first propose an Asynchronous Graph Diffusion Network to model the asynchronous spatial dependency between the time-misaligned traffic state measurements of lanes. After that, to capture the temporal dependency within irregular traffic state sequence, a learnable personalized time encoding is devised to embed the continuous time for each lane. Then we propose a Transformable Time-aware Convolution Network that learns meta-filters to derive time-aware convolution filters with transformable filter sizes for efficient temporal convolution on the irregular sequence. Furthermore, a Semi-Autoregressive Prediction Network consisting of a state evolution unit and a semiautoregressive predictor is designed to effectively and efficiently predict variable-length traffic state sequences. Extensive experiments on two real-world datasets demonstrate the effectiveness of ASeer in six metrics.
    Conditioning Score-Based Generative Models by Neuro-Symbolic Constraints. (arXiv:2308.16534v1 [cs.LG])
    Score-based and diffusion models have emerged as effective approaches for both conditional and unconditional generation. Still conditional generation is based on either a specific training of a conditional model or classifier guidance, which requires training a noise-dependent classifier, even when the classifier for uncorrupted data is given. We propose an approach to sample from unconditional score-based generative models enforcing arbitrary logical constraints, without any additional training. Firstly, we show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint. Then, we define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints. Combining these two ingredients we obtain a general, but approximate, conditional sampling algorithm. We further developed effective heuristics aimed at improving the approximation. Finally, we show the effectiveness of our approach for various types of constraints and data: tabular data, images and time series.
    Robust Representation Learning for Unreliable Partial Label Learning. (arXiv:2308.16718v1 [cs.LG])
    Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth. However, this idealistic assumption may not always hold due to potential annotation inaccuracies, meaning the ground-truth may not be present in the candidate label set. This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels, often resulting in a sub-optimal performance with existing methods. To address this challenge, we propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively. Concurrently, we propose a dual strategy that combines KNN-based candidate label set correction and consistency-regularization-based label disambiguation to refine label quality and enhance the ability of representation learning within the URRL framework. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art PLL methods on various datasets with diverse degrees of unreliability and ambiguity. Furthermore, we provide a theoretical analysis of our approach from the perspective of the expectation maximization (EM) algorithm. Upon acceptance, we pledge to make the code publicly accessible.
    Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting. (arXiv:2308.16678v1 [cs.SD])
    Although deep learning has made strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the information flow to take into account the injected dynamism. We show the trade-offs between performance and computational complexity based on established metrics.
    Curvature-based Pooling within Graph Neural Networks. (arXiv:2308.16516v1 [cs.LG])
    Over-squashing and over-smoothing are two critical issues, that limit the capabilities of graph neural networks (GNNs). While over-smoothing eliminates the differences between nodes making them indistinguishable, over-squashing refers to the inability of GNNs to propagate information over long distances, as exponentially many node states are squashed into fixed-size representations. Both phenomena share similar causes, as both are largely induced by the graph topology. To mitigate these problems in graph classification tasks, we propose CurvPool, a novel pooling method. CurvPool exploits the notion of curvature of a graph to adaptively identify structures responsible for both over-smoothing and over-squashing. By clustering nodes based on the Balanced Forman curvature, CurvPool constructs a graph with a more suitable structure, allowing deeper models and the combination of distant information. We compare it to other state-of-the-art pooling approaches and establish its competitiveness in terms of classification accuracy, computational complexity, and flexibility. CurvPool outperforms several comparable methods across all considered tasks. The most consistent results are achieved by pooling densely connected clusters using the sum aggregation, as this allows additional information about the size of each pool.
    A Causal Discovery Approach To Learn How Urban Form Shapes Sustainable Mobility Across Continents. (arXiv:2308.16599v1 [cs.LG])
    Global sustainability requires low-carbon urban transport systems, shaped by adequate infrastructure, deployment of low-carbon transport modes and shifts in travel behavior. To adequately implement alterations in infrastructure, it's essential to grasp the location-specific cause-and-effect mechanisms that the constructed environment has on travel. Yet, current research falls short in representing causal relationships between the 6D urban form variables and travel, generalizing across different regions, and modeling urban form effects at high spatial resolution. Here, we address all three gaps by utilizing a causal discovery and an explainable machine learning framework to detect urban form effects on intra-city travel based on high-resolution mobility data of six cities across three continents. We show that both distance to city center, demographics and density indirectly affect other urban form features. By considering the causal relationships, we find that location-specific influences align across cities, yet vary in magnitude. In addition, the spread of the city and the coverage of jobs across the city are the strongest determinants of travel-related emissions, highlighting the benefits of compact development and associated benefits. Differences in urban form effects across the cities call for a more holistic definition of 6D measures. Our work is a starting point for location-specific analysis of urban form effects on mobility behavior using causal discovery approaches, which is highly relevant for city planners and municipalities across continents.
    US-SFNet: A Spatial-Frequency Domain-based Multi-branch Network for Cervical Lymph Node Lesions Diagnoses in Ultrasound Images. (arXiv:2308.16738v1 [eess.IV])
    Ultrasound imaging serves as a pivotal tool for diagnosing cervical lymph node lesions. However, the diagnoses of these images largely hinge on the expertise of medical practitioners, rendering the process susceptible to misdiagnoses. Although rapidly developing deep learning has substantially improved the diagnoses of diverse ultrasound images, there remains a conspicuous research gap concerning cervical lymph nodes. The objective of our work is to accurately diagnose cervical lymph node lesions by leveraging a deep learning model. To this end, we first collected 3392 images containing normal lymph nodes, benign lymph node lesions, malignant primary lymph node lesions, and malignant metastatic lymph node lesions. Given that ultrasound images are generated by the reflection and scattering of sound waves across varied bodily tissues, we proposed the Conv-FFT Block. It integrates convolutional operations with the fast Fourier transform to more astutely model the images. Building upon this foundation, we designed a novel architecture, named US-SFNet. This architecture not only discerns variances in ultrasound images from the spatial domain but also adeptly captures microstructural alterations across various lesions in the frequency domain. To ascertain the potential of US-SFNet, we benchmarked it against 12 popular architectures through five-fold cross-validation. The results show that US-SFNet is SOTA and can achieve 92.89% accuracy, 90.46% precision, 89.95% sensitivity and 97.49% specificity, respectively.
    CktGNN: Circuit Graph Neural Network for Electronic Design Automation. (arXiv:2308.16406v1 [cs.LG])
    The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications. In the past decades, intensive research efforts have mostly been paid to automate the transistor sizing with a given circuit topology. By recognizing the graph nature of circuits, this paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing based on the encoder-dependent optimization subroutines. Particularly, CktGNN encodes circuit graphs using a two-level GNN framework (of nested GNN) where circuits are represented as combinations of subgraphs in a known subgraph basis. In this way, it significantly improves design efficiency by reducing the number of subgraphs to perform message passing. Nonetheless, another critical roadblock to advancing learning-assisted circuit design automation is a lack of public benchmarks to perform canonical assessment and reproducible research. To tackle the challenge, we introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers with carefully-extracted circuit specifications. OCB is also equipped with communicative circuit generation and evaluation capabilities such that it can help to generalize CktGNN to design various analog circuits by producing corresponding datasets. Experiments on OCB show the extraordinary advantages of CktGNN through representation-based optimization frameworks over other recent powerful GNN baselines and human experts' manual designs. Our work paves the way toward a learning-based open-sourced design automation for analog circuits. Our source code is available at \url{https://github.com/zehao-dong/CktGNN}.
    DECODE: DilatEd COnvolutional neural network for Detecting Extreme-mass-ratio inspirals. (arXiv:2308.16422v1 [astro-ph.IM])
    The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and are often constrained by data duration and SNR. In addition, most existing work ignores time-delay interferometry (TDI) and applies the long-wavelength approximation in detector response calculations, thus limiting their ability to handle laser frequency noise. In this study, we introduce DECODE, an end-to-end model focusing on EMRI signal detection by sequence modeling in the frequency domain. Centered around a dilated causal convolutional neural network, trained on synthetic data considering TDI-1.5 detector response, DECODE can efficiently process a year's worth of multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year data with accumulated SNR ranging from 50 to 120 and achieve a true positive rate of 96.3% at a false positive rate of 1%, keeping an inference time of less than 0.01 seconds. With the visualization of three showcased EMRI signals for interpretability and generalization, DECODE exhibits strong potential for future space-based gravitational wave data analyses.
    Learning Diverse Features in Vision Transformers for Improved Generalization. (arXiv:2308.16274v1 [cs.CV])
    Deep learning models often rely only on a small set of features even when there is a rich set of predictive signals in the training data. This makes models brittle and sensitive to distribution shifts. In this work, we first examine vision transformers (ViTs) and find that they tend to extract robust and spurious features with distinct attention heads. As a result of this modularity, their performance under distribution shifts can be significantly improved at test time by pruning heads corresponding to spurious features, which we demonstrate using an "oracle selection" on validation data. Second, we propose a method to further enhance the diversity and complementarity of the learned features by encouraging orthogonality of the attention heads' input gradients. We observe improved out-of-distribution performance on diagnostic benchmarks (MNIST-CIFAR, Waterbirds) as a consequence of the enhanced diversity of features and the pruning of undesirable heads.
    Emoji Promotes Developer Participation and Issue Resolution on GitHub. (arXiv:2308.16360v1 [cs.CY])
    Although remote working is increasingly adopted during the pandemic, many are concerned by the low-efficiency in the remote working. Missing in text-based communication are non-verbal cues such as facial expressions and body language, which hinders the effective communication and negatively impacts the work outcomes. Prevalent on social media platforms, emojis, as alternative non-verbal cues, are gaining popularity in the virtual workspaces well. In this paper, we study how emoji usage influences developer participation and issue resolution in virtual workspaces. To this end, we collect GitHub issues for a one-year period and apply causal inference techniques to measure the causal effect of emojis on the outcome of issues, controlling for confounders such as issue content, repository, and author information. We find that emojis can significantly reduce the resolution time of issues and attract more user participation. We also compare the heterogeneous effect on different types of issues. These findings deepen our understanding of the developer communities, and they provide design implications on how to facilitate interactions and broaden developer participation.
    A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems. (arXiv:2308.16471v1 [cs.RO])
    In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.
    Classification of Anomalies in Telecommunication Network KPI Time Series. (arXiv:2308.16279v1 [cs.LG])
    The increasing complexity and scale of telecommunication networks have led to a growing interest in automated anomaly detection systems. However, the classification of anomalies detected on network Key Performance Indicators (KPI) has received less attention, resulting in a lack of information about anomaly characteristics and classification processes. To address this gap, this paper proposes a modular anomaly classification framework. The framework assumes separate entities for the anomaly classifier and the detector, allowing for a distinct treatment of anomaly detection and classification tasks on time series. The objectives of this study are (1) to develop a time series simulator that generates synthetic time series resembling real-world network KPI behavior, (2) to build a detection model to identify anomalies in the time series, (3) to build classification models that accurately categorize detected anomalies into predefined classes (4) to evaluate the classification framework performance on simulated and real-world network KPI time series. This study has demonstrated the good performance of the anomaly classification models trained on simulated anomalies when applied to real-world network time series data.
    Deep Inductive Logic Programming meets Reinforcement Learning. (arXiv:2308.16210v1 [cs.LG])
    One approach to explaining the hierarchical levels of understanding within a machine learning model is the symbolic method of inductive logic programming (ILP), which is data efficient and capable of learning first-order logic rules that can entail data behaviour. A differentiable extension to ILP, so-called differentiable Neural Logic (dNL) networks, are able to learn Boolean functions as their neural architecture includes symbolic reasoning. We propose an application of dNL in the field of Relational Reinforcement Learning (RRL) to address dynamic continuous environments. This represents an extension of previous work in applying dNL-based ILP in RRL settings, as our proposed model updates the architecture to enable it to solve problems in continuous RL environments. The goal of this research is to improve upon current ILP methods for use in RRL by incorporating non-linear continuous predicates, allowing RRL agents to reason and make decisions in dynamic and continuous environments.  ( 2 min )
    Symmetry Preservation in Hamiltonian Systems: Simulation and Learning. (arXiv:2308.16331v1 [math-ph])
    This work presents a general geometric framework for simulating and learning the dynamics of Hamiltonian systems that are invariant under a Lie group of transformations. This means that a group of symmetries is known to act on the system respecting its dynamics and, as a consequence, Noether's Theorem, conserved quantities are observed. We propose to simulate and learn the mappings of interest through the construction of $G$-invariant Lagrangian submanifolds, which are pivotal objects in symplectic geometry. A notable property of our constructions is that the simulated/learned dynamics also preserves the same conserved quantities as the original system, resulting in a more faithful surrogate of the original dynamics than non-symmetry aware methods, and in a more accurate predictor of non-observed trajectories. Furthermore, our setting is able to simulate/learn not only Hamiltonian flows, but any Lie group-equivariant symplectic transformation. Our designs leverage pivotal techniques and concepts in symplectic geometry and geometric mechanics: reduction theory, Noether's Theorem, Lagrangian submanifolds, momentum mappings, and coisotropic reduction among others. We also present methods to learn Poisson transformations while preserving the underlying geometry and how to endow non-geometric integrators with geometric properties. Thus, this work presents a novel attempt to harness the power of symplectic and Poisson geometry towards simulating and learning problems.  ( 2 min )
    SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills. (arXiv:2308.16369v1 [cs.LG])
    Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill phase effectively saturates GPU compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times also lead to imbalance across micro-batches when using pipeline parallelism, resulting in further inefficiency due to bubbles. We present SARATHI to address these challenges. SARATHI employs chunked-prefills, which splits a prefill request into equal sized chunks, and decode-maximal batching, which constructs a batch using a single prefill chunk and populates the remaining slots with decodes. During inference, the prefill chunk saturates GPU compute, while the decode requests 'piggyback' and cost up to an order of magnitude less compared to a decode-only batch. Chunked-prefills allows constructing multiple decode-maximal batches from a single prefill request, maximizing coverage of decodes that can piggyback. Furthermore, the uniform compute design of these batches ameliorates the imbalance between micro-batches, significantly reducing pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware. For the LLaMA-13B model on A6000 GPU, SARATHI improves decode throughput by up to 10x, and accelerates end-to-end throughput by up to 1.33x. For LLaMa-33B on A100 GPU, we achieve 1.25x higher end-to-end-throughput and up to 4.25x higher decode throughput. When used with pipeline parallelism on GPT-3, SARATHI reduces bubbles by 6.29x, resulting in an end-to-end throughput improvement of 1.91x.  ( 3 min )
  • Open

    Invertible normalizing flow neural networks by JKO scheme. (arXiv:2212.14424v2 [stat.ML] UPDATED)
    Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks without sampling SDE trajectories or inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one by one, reducing the memory load and difficulty in performing end-to-end deep flow network training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with the existing flow and diffusion models at a significantly reduced computational and memory cost.
    Leveraging Image-based Generative Adversarial Networks for Time Series Generation. (arXiv:2112.08060v2 [cs.LG] UPDATED)
    Generative models for images have gained significant attention in computer vision and natural language processing due to their ability to generate realistic samples from complex data distributions. To leverage the advances of image-based generative models for the time series domain, we propose a two-dimensional image representation for time series, the Extended Intertemporal Return Plot (XIRP). Our approach captures the intertemporal time series dynamics in a scale-invariant and invertible way, reducing training time and improving sample quality. We benchmark synthetic XIRPs obtained by an off-the-shelf Wasserstein GAN with gradient penalty (WGAN-GP) to other image representations and models regarding similarity and predictive ability metrics. Our novel, validated image representation for time series consistently and significantly outperforms a state-of-the-art RNN-based generative model regarding predictive ability. Further, we introduce an improved stochastic inversion to substantially improve simulation quality regardless of the representation and provide the prospect of transfer potentials in other domains.
    Least Squares Maximum and Weighted Generalization-Memorization Machines. (arXiv:2308.16456v1 [stat.ML])
    In this paper, we propose a new way of remembering by introducing a memory influence mechanism for the least squares support vector machine (LSSVM). Without changing the equation constraints of the original LSSVM, this mechanism, allows an accurate partitioning of the training set without overfitting. The maximum memory impact model (MIMM) and the weighted impact memory model (WIMM) are then proposed. It is demonstrated that these models can be degraded to the LSSVM. Furthermore, we propose some different memory impact functions for the MIMM and WIMM. The experimental results show that that our MIMM and WIMM have better generalization performance compared to the LSSVM and significant advantage in time cost compared to other memory models.
    Temporal-spatial model via Trend Filtering. (arXiv:2308.16172v2 [stat.ME] UPDATED)
    This research focuses on the estimation of a non-parametric regression function designed for data with simultaneous time and space dependencies. In such a context, we study the Trend Filtering, a nonparametric estimator introduced by \cite{mammen1997locally} and \cite{rudin1992nonlinear}. For univariate settings, the signals we consider are assumed to have a kth weak derivative with bounded total variation, allowing for a general degree of smoothness. In the multivariate scenario, we study a $K$-Nearest Neighbor fused lasso estimator as in \cite{padilla2018adaptive}, employing an ADMM algorithm, suitable for signals with bounded variation that adhere to a piecewise Lipschitz continuity criterion. By aligning with lower bounds, the minimax optimality of our estimators is validated. A unique phase transition phenomenon, previously uncharted in Trend Filtering studies, emerges through our analysis. Both Simulation studies and real data applications underscore the superior performance of our method when compared with established techniques in the existing literature.
    StyleDiff: Attribute Comparison Between Unlabeled Datasets in Latent Disentangled Space. (arXiv:2303.05102v2 [stat.ML] UPDATED)
    One major challenge in machine learning applications is coping with mismatches between the datasets used in the development and those obtained in real-world applications. These mismatches may lead to inaccurate predictions and errors, resulting in poor product quality and unreliable systems. In this study, we propose StyleDiff to inform developers of the differences between the two datasets for the steady development of machine learning systems. Using disentangled image spaces obtained from recently proposed generative models, StyleDiff compares the two datasets by focusing on attributes in the images and provides an easy-to-understand analysis of the differences between the datasets. The proposed StyleDiff performs in $O (d N\log N)$, where $N$ is the size of the datasets and $d$ is the number of attributes, enabling the application to large datasets. We demonstrate that StyleDiff accurately detects differences between datasets and presents them in an understandable format using, for example, driving scenes datasets.
    Biclustering Methods via Sparse Penalty. (arXiv:2308.14388v2 [stat.ML] UPDATED)
    In this paper, we first reviewed several biclustering methods that are used to identify the most significant clusters in gene expression data. Here we mainly focused on the SSVD(sparse SVD) method and tried a new sparse penalty named "Prenet penalty" which has been used only in factor analysis to gain sparsity. Then in the simulation study, we tried different types of generated datasets (with different sparsity and dimension) and tried 1-layer approximation then for k-layers which shows the mixed Prenet penalty is very effective for non-overlapped data. Finally, we used some real gene expression data to show the behavior of our methods.
    Multi-Response Heteroscedastic Gaussian Process Models and Their Inference. (arXiv:2308.15370v2 [stat.ML] UPDATED)
    Despite the widespread utilization of Gaussian process models for versatile nonparametric modeling, they exhibit limitations in effectively capturing abrupt changes in function smoothness and accommodating relationships with heteroscedastic errors. Addressing these shortcomings, the heteroscedastic Gaussian process (HeGP) regression seeks to introduce flexibility by acknowledging the variability of residual variances across covariates in the regression model. In this work, we extend the HeGP concept, expanding its scope beyond regression tasks to encompass classification and state-space models. To achieve this, we propose a novel framework where the Gaussian process is coupled with a covariate-induced precision matrix process, adopting a mixture formulation. This approach enables the modeling of heteroscedastic covariance functions across covariates. To mitigate the computational challenges posed by sampling, we employ variational inference to approximate the posterior and facilitate posterior predictive modeling. Additionally, our training process leverages an EM algorithm featuring closed-form M-step updates to efficiently evaluate the heteroscedastic covariance function. A notable feature of our model is its consistent performance on multivariate responses, accommodating various types (continuous or categorical) seamlessly. Through a combination of simulations and real-world applications in climatology, we illustrate the model's prowess and advantages. By overcoming the limitations of traditional Gaussian process models, our proposed framework offers a robust and versatile tool for a wide array of applications.
    Karhunen-Lo\`eve Data Imputation in High Contrast Imaging. (arXiv:2308.16912v1 [astro-ph.IM])
    Detection and characterization of extended structures is a crucial goal in high contrast imaging. However, these structures face challenges in data reduction, leading to over-subtraction from speckles and self-subtraction with most existing methods. Iterative post-processing methods offer promising results, but their integration into existing pipelines is hindered by selective algorithms, high computational cost, and algorithmic regularization. To address this for reference differential imaging (RDI), here we propose the data imputation concept to Karhunen-Lo\`eve transform (DIKL) by modifying two steps in the standard Karhunen-Lo\`eve image projection (KLIP) method. Specifically, we partition an image to two matrices: an anchor matrix which focuses only on the speckles to obtain the DIKL coefficients, and a boat matrix which focuses on the regions of astrophysical interest for speckle removal using DIKL components. As an analytical approach, DIKL achieves high-quality results with significantly reduced computational cost (~3 orders of magnitude less than iterative methods). Being a derivative method of KLIP, DIKL is seamlessly integrable into high contrast imaging pipelines for RDI observations.  ( 3 min )
    Approximate Bayesian inference from noisy likelihoods with Gaussian process emulated MCMC. (arXiv:2104.03942v2 [stat.ME] UPDATED)
    We present a framework for approximate Bayesian inference when only a limited number of noisy log-likelihood evaluations can be obtained due to computational constraints, which is becoming increasingly common for applications of complex models. We model the log-likelihood function using a Gaussian process (GP) and the main methodological innovation is to apply this model to emulate the progression that an exact Metropolis-Hastings (MH) sampler would take if it was applicable. Informative log-likelihood evaluation locations are selected using a sequential experimental design strategy until the MH accept/reject decision is done accurately enough according to the GP model. The resulting approximate sampler is conceptually simple and sample-efficient. It is also more robust to violations of GP modelling assumptions compared with earlier, related "Bayesian optimisation-like" methods tailored for Bayesian inference. We discuss some theoretical aspects and various interpretations of the resulting approximate MH sampler, and demonstrate its benefits in the context of Bayesian and generalised Bayesian likelihood-free inference for simulator-based statistical models.  ( 2 min )
    Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics. (arXiv:2308.16680v1 [stat.ML])
    We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.  ( 2 min )
    On-Demand Communication for Asynchronous Multi-Agent Bandits. (arXiv:2302.07446v2 [cs.LG] UPDATED)
    This paper studies a cooperative multi-agent multi-armed stochastic bandit problem where agents operate asynchronously -- agent pull times and rates are unknown, irregular, and heterogeneous -- and face the same instance of a K-armed bandit problem. Agents can share reward information to speed up the learning process at additional communication costs. We propose ODC, an on-demand communication protocol that tailors the communication of each pair of agents based on their empirical pull times. ODC is efficient when the pull times of agents are highly heterogeneous, and its communication complexity depends on the empirical pull times of agents. ODC is a generic protocol that can be integrated into most cooperative bandit algorithms without degrading their performance. We then incorporate ODC into the natural extensions of UCB and AAE algorithms and propose two communication-efficient cooperative algorithms. Our analysis shows that both algorithms are near-optimal in regret.  ( 2 min )
    A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks. (arXiv:2304.14994v2 [cs.LG] UPDATED)
    Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic forgetting impairs the applicability of this approach to initial value problems (IVPs). In an alternative local-in-time approach, the optimization problem can be converted into an ordinary differential equation (ODE) on the network parameters and the solution propagated forward in time; however, we demonstrate that current methods based on this approach suffer from two key issues. First, following the ODE produces an uncontrolled growth in the conditioning of the problem, ultimately leading to unacceptably large numerical errors. Second, as the ODE methods scale cubically with the number of model parameters, they are restricted to small neural networks, significantly limiting their ability to represent intricate PDE initial conditions and solutions. Building on these insights, we develop Neural IVP, an ODE based IVP solver which prevents the network from getting ill-conditioned and runs in time linear in the number of parameters, enabling us to evolve the dynamics of challenging PDEs with neural networks.  ( 3 min )
    Calibrated Explanations for Regression. (arXiv:2308.16245v1 [cs.LG])
    Artificial Intelligence (AI) is often an integral part of modern decision support systems (DSSs). The best-performing predictive models used in AI-based DSSs lack transparency. Explainable Artificial Intelligence (XAI) aims to create AI systems that can explain their rationale to human users. Local explanations in XAI can provide information about the causes of individual predictions in terms of feature importance. However, a critical drawback of existing local explanation methods is their inability to quantify the uncertainty associated with a feature's importance. This paper introduces an extension of a feature importance explanation method, Calibrated Explanations (CE), previously only supporting classification, with support for standard regression and probabilistic regression, i.e., the probability that the target is above an arbitrary threshold. The extension for regression keeps all the benefits of CE, such as calibration of the prediction from the underlying model with confidence intervals, uncertainty quantification of feature importance, and allows both factual and counterfactual explanations. CE for standard regression provides fast, reliable, stable, and robust explanations. CE for probabilistic regression provides an entirely new way of creating probabilistic explanations from any ordinary regression model and with a dynamic selection of thresholds. The performance of CE for probabilistic regression regarding stability and speed is comparable to LIME. The method is model agnostic with easily understood conditional rules. An implementation in Python is freely available on GitHub and for installation using pip making the results in this paper easily replicable.  ( 2 min )
    Multiple Augmented Reduced Rank Regression for Pan-Cancer Analysis. (arXiv:2308.16333v1 [stat.ME])
    Statistical approaches that successfully combine multiple datasets are more powerful, efficient, and scientifically informative than separate analyses. To address variation architectures correctly and comprehensively for high-dimensional data across multiple sample sets (i.e., cohorts), we propose multiple augmented reduced rank regression (maRRR), a flexible matrix regression and factorization method to concurrently learn both covariate-driven and auxiliary structured variation. We consider a structured nuclear norm objective that is motivated by random matrix theory, in which the regression or factorization terms may be shared or specific to any number of cohorts. Our framework subsumes several existing methods, such as reduced rank regression and unsupervised multi-matrix factorization approaches, and includes a promising novel approach to regression and factorization of a single dataset (aRRR) as a special case. Simulations demonstrate substantial gains in power from combining multiple datasets, and from parsimoniously accounting for all structured variation. We apply maRRR to gene expression data from multiple cancer types (i.e., pan-cancer) from TCGA, with somatic mutations as covariates. The method performs well with respect to prediction and imputation of held-out data, and provides new insights into mutation-driven and auxiliary variation that is shared or specific to certain cancer types.  ( 2 min )
    Forecasting Emergency Department Crowding with Advanced Machine Learning Models and Multivariable Input. (arXiv:2308.16544v1 [cs.LG])
    Emergency department (ED) crowding is a significant threat to patient safety and it has been repeatedly associated with increased mortality. Forecasting future service demand has the potential patient outcomes. Despite active research on the subject, several gaps remain: 1) proposed forecasting models have become outdated due to quick influx of advanced machine learning models (ML), 2) amount of multivariable input data has been limited and 3) discrete performance metrics have been rarely reported. In this study, we document the performance of a set of advanced ML models in forecasting ED occupancy 24 hours ahead. We use electronic health record data from a large, combined ED with an extensive set of explanatory variables, including the availability of beds in catchment area hospitals, traffic data from local observation stations, weather variables, etc. We show that N-BEATS and LightGBM outpeform benchmarks with 11 % and 9 % respective improvements and that DeepAR predicts next day crowding with an AUC of 0.76 (95 % CI 0.69-0.84). To the best of our knowledge, this is the first study to document the superiority of LightGBM and N-BEATS over statistical benchmarks in the context of ED forecasting.  ( 2 min )
    Hypergraph Structure Inference From Data Under Smoothness Prior. (arXiv:2308.14172v2 [cs.LG] UPDATED)
    Hypergraphs are important for processing data with higher-order relationships involving more than two entities. In scenarios where explicit hypergraphs are not readily available, it is desirable to infer a meaningful hypergraph structure from the node features to capture the intrinsic relations within the data. However, existing methods either adopt simple pre-defined rules that fail to precisely capture the distribution of the potential hypergraph structure, or learn a mapping between hypergraph structures and node features but require a large amount of labelled data, i.e., pre-existing hypergraph structures, for training. Both restrict their applications in practical scenarios. To fill this gap, we propose a novel smoothness prior that enables us to design a method to infer the probability for each potential hyperedge without labelled data as supervision. The proposed prior indicates features of nodes in a hyperedge are highly correlated by the features of the hyperedge containing them. We use this prior to derive the relation between the hypergraph structure and the node features via probabilistic modelling. This allows us to develop an unsupervised inference method to estimate the probability for each potential hyperedge via solving an optimisation problem that has an analytical solution. Experiments on both synthetic and real-world data demonstrate that our method can learn meaningful hypergraph structures from data more efficiently than existing hypergraph structure inference methods.  ( 3 min )
    A stochastic block model for community detection in attributed networks. (arXiv:2308.16382v1 [cs.SI])
    Community detection is an important content in complex network analysis. The existing community detection methods in attributed networks mostly focus on only using network structure, while the methods of integrating node attributes is mainly for the traditional community structures, and cannot detect multipartite structures and mixture structures in network. In addition, the model-based community detection methods currently proposed for attributed networks do not fully consider unique topology information of nodes, such as betweenness centrality and clustering coefficient. Therefore, a stochastic block model that integrates betweenness centrality and clustering coefficient of nodes for community detection in attributed networks, named BCSBM, is proposed in this paper. Different from other generative models for attributed networks, the generation process of links and attributes in BCSBM model follows the Poisson distribution, and the probability between community is considered based on the stochastic block model. Moreover, the betweenness centrality and clustering coefficient of nodes are introduced into the process of links and attributes generation. Finally, the expectation maximization algorithm is employed to estimate the parameters of the BCSBM model, and the node-community memberships is obtained through the hard division process, so the community detection is completed. By experimenting on six real-work networks containing different network structures, and comparing with the community detection results of five algorithms, the experimental results show that the BCSBM model not only inherits the advantages of the stochastic block model and can detect various network structures, but also has good data fitting ability due to introducing the betweenness centrality and clustering coefficient of nodes. Overall, the performance of this model is superior to other five compared algorithms.  ( 3 min )
    GRASP: A Goodness-of-Fit Test for Classification Learning. (arXiv:2209.02064v2 [stat.ME] UPDATED)
    Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterizing the fit of the model to the underlying conditional law of labels given the features vector ($Y|X$), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law $Y|X$, and treats that as a black box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form \[ H_0: \mathbb{E}\Big[D_f\Big({\sf Bern}(\eta(X))\|{\sf Bern}(\hat{\eta}(X))\Big)\Big]\leq \tau\,, \] where $D_f$ represents an $f$-divergence function, and $\eta(x)$, $\hat{\eta}(x)$ respectively denote the true and an estimate likelihood for a feature vector $x$ admitting a positive label. We propose a novel test, called \grasp for testing $H_0$, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X \grasp designed for model-X settings where the joint distribution of the features vector is known. Model-X \grasp uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.  ( 2 min )
    Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness. (arXiv:2308.16681v1 [stat.ML])
    A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. When designed well, these systems promise more objective decisions while saving large amounts of resources and freeing up human time. However, when ADM systems are not designed well, they can lead to unfair decisions which discriminate against societal groups. The downstream effects of ADMs critically depend on the decisions made during the systems' design and implementation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these design decisions are made implicitly, without knowing exactly how they will influence the final system. It is therefore important to make explicit the decisions made during the design of ADM systems and understand how these decisions affect the fairness of the resulting system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health coverage of vulnerable populations for potential interventions. Our results illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness and how to detect these effects using multiverse analysis.  ( 3 min )
    Generative Sliced MMD Flows with Riesz Kernels. (arXiv:2305.11463v2 [cs.LG] UPDATED)
    Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \Vert x-y\Vert^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.  ( 2 min )
    Learning Channel Importance for High Content Imaging with Interpretable Deep Input Channel Mixing. (arXiv:2308.16637v1 [cs.CV])
    Uncovering novel drug candidates for treating complex diseases remain one of the most challenging tasks in early discovery research. To tackle this challenge, biopharma research established a standardized high content imaging protocol that tags different cellular compartments per image channel. In order to judge the experimental outcome, the scientist requires knowledge about the channel importance with respect to a certain phenotype for decoding the underlying biology. In contrast to traditional image analysis approaches, such experiments are nowadays preferably analyzed by deep learning based approaches which, however, lack crucial information about the channel importance. To overcome this limitation, we present a novel approach which utilizes multi-spectral information of high content images to interpret a certain aspect of cellular biology. To this end, we base our method on image blending concepts with alpha compositing for an arbitrary number of channels. More specifically, we introduce DCMIX, a lightweight, scaleable and end-to-end trainable mixing layer which enables interpretable predictions in high content imaging while retaining the benefits of deep learning based methods. We employ an extensive set of experiments on both MNIST and RXRX1 datasets, demonstrating that DCMIX learns the biologically relevant channel importance without scarifying prediction performance.  ( 2 min )
    High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods. (arXiv:2308.16192v1 [econ.EM])
    These lecture notes provide an overview of existing methodologies and recent developments for estimation and inference with high dimensional time series regression models. First, we present main limit theory results for high dimensional dependent data which is relevant to covariance matrix structures as well as to dependent time series sequences. Second, we present main aspects of the asymptotic theory related to time series regression models with many covariates. Third, we discuss various applications of statistical learning methodologies for time series analysis purposes.  ( 2 min )
    Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs. (arXiv:2308.16859v1 [stat.ML])
    In this article, the optimal sample complexity of learning the underlying interaction/dependencies of a Linear Dynamical System (LDS) over a Directed Acyclic Graph (DAG) is studied. The sample complexity of learning a DAG's structure is well-studied for static systems, where the samples of nodal states are independent and identically distributed (i.i.d.). However, such a study is less explored for DAGs with dynamical systems, where the nodal states are temporally correlated. We call such a DAG underlying an LDS as \emph{dynamical} DAG (DDAG). In particular, we consider a DDAG where the nodal dynamics are driven by unobserved exogenous noise sources that are wide-sense stationary (WSS) in time but are mutually uncorrelated, and have the same {power spectral density (PSD)}. Inspired by the static settings, a metric and an algorithm based on the PSD matrix of the observed time series are proposed to reconstruct the DDAG. The equal noise PSD assumption can be relaxed such that identifiability conditions for DDAG reconstruction are not violated. For the LDS with WSS (sub) Gaussian exogenous noise sources, it is shown that the optimal sample complexity (or length of state trajectory) needed to learn the DDAG is $n=\Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. To prove the sample complexity upper bound, a concentration bound for the PSD estimation is derived, under two different sampling strategies. A matching min-max lower bound using generalized Fano's inequality also is provided, thus showing the order optimality of the proposed algorithm.  ( 3 min )
    On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint. (arXiv:2308.16425v1 [cs.LG])
    Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.  ( 2 min )

  • Open

    [D][R] Best way to upsample features in a neural network
    Hi fellow computer scientists, ​ 1) I have been wondering if there is a preferred way to upsample features. I though about 3 options: ​ 1.a) Upsample Layer + Conv Layer ​ 1.b) Transposed Conv Layer + Conv Layer ​ 1.c) PixelShuffle Layer + Conv Layer ​ 2) Also, considering option 1.c, should the Conv layer multiply the number of PixelShuffle output features by the scale factor because PixelShuffle does reduce the number of output features? i.e. I have a tensor of dims (B, C, W, H, D) and with shape (1, 60, 64, 64, 64). After the Pixel-shuffle with an upscale factor of 4 I get the tensor of shape (1, 15, 256, 256, 256). Afterwards the following Conv layer should output a tensor like: 2.a) (1, 15, 256, 256, 256), where in_channels=15 and out_channels=15 2.b) (1, 60, 256, 256, 256), where in_channels=15 and out_channels=60 Note the second option reinstates the number of input features. ​ 3) I have an additional question that can happens in both 1.a, 1.b and 1.c options. Imagine I need to upsample my features by a factor of 8. 3.a) Is it preferred to have multiple upsample blocks (Upsample Layer + Conv layer), where the upsample layers have a scale factor of 2, thus for this example we would have 3 upsample blocks (2 ** 3 = 8). 3.b) Have only one upsample block where the Upsample layer has the full scale factor desired and it is then followed by one Conv layer. ​ Thank you all :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    [D] Am I the only one finding this a bit upsetting?
    Hello everyone, In the process of writing up a literature review for my master's thesis, I wanted to cover the impact of ReLU on the field which was significant. When looking for an original paper I came across this paper/report: https://arxiv.org/abs/1803.08375. There isn't anything special about this work and as a matter of fact, I was surprised that it has thousands of citations (2974 at the moment of writing this post according to Google Scholar). Given this and that this work is not an original ReLU paper but more of a file documenting an implementation of it for a particular setup I found it quite intriguing. Then I started to dig into works that cited this and unexpectedly papers from top conferences such as NeurIPS cited the aforementioned document as a reference to the activation function. Here are some examples: https://proceedings.neurips.cc/paper_files/paper/2022/file/fbb10d319d44f8c3b4720873e4177c65-Paper-Conference.pdf https://proceedings.neurips.cc/paper_files/paper/2022/file/69e2f49ab0837b71b0e0cb7c555990f8-Paper-Conference.pdf The researchers who have done that are not referencing the original ReLU paper instead which I think is a bit disrespectful towards the achievement of original authors. On the other hand, maybe I am overthinking it a bit. ReLU has been around for a while and it would be surprising for someone conducting research in deep learning to not knowing it hence as a reader I wouldn't necessarily mind if people did not include the reference to the paper which is widely known. However, I reckon if a reference is made, then it should be meaningful and correct, and not just another extra few lines in a bibliography making it look big. submitted by /u/dj_giga_chinol [link] [comments]  ( 9 min )
    [D] best local LLM for answering to custom document
    Hi guys, I'm developing a local tool able to reply question related to one or more document. I found a good solution in using sentence embedding followed by similarity search to include only the most significative part of the document in the prompt. In this contest I search for the lightest LLM able to reply to this question. For example, LLM based on Bert are generally smaller but are they good enough? I'm not an expert in this field, I hope I give you meaningful information. Thanks! 🙏 submitted by /u/Tough-Assistant-9740 [link] [comments]  ( 9 min )
    [D][R] New to ML Research, how often are you disheartened when something you have been working on for months does not work out ? and how do you deal with it ?
    I am new to research in ML, at present a grad student and began working in a lab on my own work. My advisor is very understanding, supportive and took a leap of faith to fund me, since I did not have prior experience in research. I have been working on a problem for 4 months now and have been getting poor results for the past week. All the literature surveys, digressions within the problem statements and running the experiments to end up with not-so-good results is extremely disheartening. ​ I am still continuing to run additional experiments, figuring out where things can be going wrong and trying to conduct further analysis, but I feel like I have let down my advisor. I still have the entire semester to work on it and possibly other stuff, I am motivated for it, but at times ponder over the huge chunk of time I have spent on the current work. ​ How do you deal with such results and hitting the wall in your research ? Does it happen often ? What would you advice I do to continue working ? submitted by /u/V1bicycle [link] [comments]  ( 9 min )
    Which text to speech is this? [D]
    https://youtube.com/shorts/mRZMOFqD0F0?si=jyHQVwq2ouAKP1t9 submitted by /u/AdGeneral5378 [link] [comments]  ( 9 min )
    [D] suggestion for AI tools (chat style) that run on-prem with vectorDB?
    Hi, I'm looking to run an on-prem ChatGPT style LLM solution that can ingest private customer data into a VectorDB. So far I have tried three... GPT4All - limited to only allows for up to 13b parameter LLMs and only on CPUs (currently), also its 'localdocs' implementation I've found to only reference its docs very infrequently when answering. H2OGPT - it's implementation of localdocs (I believe via LangChain) seems pretty good. but seems like every time I run an instance, I would have to re-vector my documents. Not sure if there is a way to attach an VectorDB to it so it's ready to go right away. PrivateGPT - seems to work very well, currently it's only running on CPUs thus response time is over a minute. Curious if the community knows of any other products that do this and are already GPU accelerated. ​ TY in advance. ​ ​ submitted by /u/konrad21 [link] [comments]  ( 9 min )
    [D] How to improve my Support Vector Machine (SVM) Paper?
    Hi guys, Seeking some advice from some experienced researchers in support vector machines and kernel methods. I made this paper that breaks down using multi-class SVM in a One Against All approach, how to solve them with Lagrange multipliers https://github.com/jacobmcasey/MultiClass-SVM-Lagrange-Hyperplane-Construction-Paper As it currently stands it’s more a nice educational resource on the topic, rather than a novel contribution. Any ideas how to extend this work into something a bit more impactful? Thanks submitted by /u/Ok_Reality2341 [link] [comments]  ( 9 min )
    [P] Efficient way to implement sparse cross-attention
    I have key-value pairs with an extensive sequence length, alongside a sparse attention mask that is data-dependent, with fewer than 5% of its elements being non-zero. I found out that Xformer has implememation for sparse self-attention (link) but not sure whether the same would work for cross-attention. Also Xformer supports only (fixed) 2D attention mask but in my case the mask is arbitary and is different for different input. Can you suggest an efficient implementation for my scenario? submitted by /u/ankanbhunia [link] [comments]  ( 9 min )
    [D]Why are special tokens not allowed in the prompt for llama-2?
    I was going through the code for Llama-2 text generation on the official github where I stumbled across this code in the generation.py file: B_INST, E_INST = "[INST]", "[/INST]" SPECIAL_TAGS = [B_INST, E_INST, ">", ">"] UNSAFE_ERROR = "Error: special tags are not allowed as part of the prompt." ... ... ... unsafe_requests = [] unsafe_requests.append(any([tag in msg["content"] for tag in SPECIAL_TAGS for msg in dialog])) ... ... ... return [ { "generation": { "role": "assistant", "content": self.tokenizer.decode(t) if not unsafe else UNSAFE_ERROR, } } for t, unsafe in zip(generation_tokens, unsafe_requests) ] Is there a reason why we can't have these tokens in the prompt? I am planning to bypass the role based dictionary entries for the prompt and instead building my own prompt generator that'll take the the system prompts and the user prompts and generate a single string to then send to the LLM. Depending on the the user's choice I want the LLM to generate concise or detailed answers(also impose a word limit in the prompt itself), so I am planning to have this as a dropdown a user can choose. based on the system option chosen(concise/detailed answer), I then want to call my prompt generator which will add the instruction tags around the "system" and "user" prompts to generate 1 string I can then pass to the LLM. I wanted to know if there was any reason these tags aren't allowed to be in the prompt. Is it only to avoid "confusion" on the different roles and following a conventional way to pass the prompts? If not, and there's a reason those tags aren't supposed to be passed inside the prompts, please do let me know,, because inside the same file the chat_completion() function is doing exactly that; adding the > and > around the system prompts and prepending it to the user prompt. submitted by /u/comical_cow [link] [comments]  ( 10 min )
    "[P]" Machine Unlearning: A Novel Framework to Unlearning, Privacy and Defending Against Inference Attacks
    Hey everyone, ​ I am excited to present my latest venture, an initiative aimed at exploring the still-murky waters of Machine Unlearning. While this new project shares its roots with our previous endeavors in biomimetic machine learning, it diverges to concentrate on the fascinating and complex issue of algorithmic forgetfulness. ​ 🎯 **Objective** ​ The cornerstone of this project is not just to create algorithms that can forget, but to do so in a way that's both efficient and secure. Our vision transcends mere algorithmic performance, embracing a multi-faceted approach that also covers privacy protections and robust defenses against model inference attacks. The ambition here is to fortify machine unlearning with a well-rounded, secure architecture, allowing it to handle real-world …  ( 10 min )
    [P] Modular Diffusion: A Python Library for Designing and Training Diffusion Models with PyTorch
    Hello everyone! I've been working on this project for a few months as part of my thesis in Machine Learning. It's a library that provides an easy-to-use yet flexible API to design and train Diffusion Models. I decided to make it because I wanted to quickly prototype a Diffusion Model but there were no good tools to do it with. I think it really can help people prototype their own Diffusion Models a lot faster and only in a few lines of code. The idea is to have a model class that takes different modules corresponding to the different aspects of the Diffusion Model process (noise schedule, noise type, denoising network, loss function, guidance, etc.) and allow the user to mix and match different modules to achieve different results. The library ships with a bunch of prebuilt modules and the plan is to add many more. I also made it super easy to implement your own modules, you just need to extend from one of the base classes available. Below is an example of the type of interface you can expect. I'd really appreciate your feedback! Check out the project here: https://github.com/cabralpinto/modular-diffusion https://preview.redd.it/0itvswxkknlb1.png?width=2528&format=png&auto=webp&s=24ce67955eadb5cf109d19716f4e5a9471b1572d submitted by /u/secularchapel [link] [comments]  ( 9 min )
    [D] What does "the actual number of English-language words" mean?
    D3PM paper https://arxiv.org/pdf/2107.03006.pdf reports perplexity on LM1B dataset. In Appendix B.2 thay authors say: Perplexities are reported relative to the actual number of English-language words in the test set (including an EOS token predicted by the model) How did they compute this number? Did they split sentences by space? Are punctuation symbols considered "English words"? Are chinese characters (which are present in the data) withous spaces counted as one word? Or is it some common knowledge that "LM1B test set contains X words"? The official implementation https://github.com/google-research/google-research/tree/master/d3pm/text is extremely difficult to comprehend. I spent several hours reading throug the code and I still have no idea how they computed the number of words. submitted by /u/Tomarchelone [link] [comments]  ( 9 min )
    [P] Significant improvements for multi-agent reinforcement learning!
    We've just released a new version of our evolutionary hyperparameter optimization RL framework, which is 10x faster than SOTA! This update is focused on multi-agent RL. We've introduced MADDPG and MATD3 to the framework. These algorithms are traditionally super brittle, and RLlib even recommends not to use their own implementation of it. However, our evolutionary framework has solved this problem! You can now train multiple agents in co-operative or competitive Petting Zoo-style (parallel API) environments, with significantly faster training and up to 4x improvement in total return when benchmarked against alternatives. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    [P] Interactively explore unstructured datasets from your dataframe (OSS project)
    Hey r/MachineLearning, data inspection and interactive exploration is one of the most important tasks for data teams. This is especially true when dealing with unstructured data that requires a deep domain expertise (e.g. healthcare or engineering). We have tried many different options for visualizing unstructured datasets in the past: Notebooks, dash apps, custom react apps, HTML reports... However, these options were either very time-consuming to develop/maintain or not interactive enough or both. That is why we developed Spotlight: https://github.com/Renumics/spotlight https://i.redd.it/lxjnlkcmumlb1.gif Spotlight supports most unstructured data types including images, audio, text, videos, time-series and geometric data. You can find more info and use case examples for ML and engineering workflows in the repo. Happy to hear your honest feedback! ​ ​ submitted by /u/44sps [link] [comments]  ( 9 min )
    [D] How many target variable classes does sentiment analysis models BERT and RoBERTa have?
    Hi everyone, so I am a little confused on how many target variable classes does the BERT and RoBERTa models have? So I understand these 2 models are pre-trained models, which means the number of target variable classes are fixed (if I am not wrong!). For example, the link below for the RoBERTa model in Hugging Face has fixed 3 target variable classes (Negative, Neutral and Positive): https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest But when I googled around and also asked ChatGPT and Bard, they tell me these models can have as many target variable classes as the user wants (or rather this depends on how many target variable classes there are in the training dataset). If these are pre-trained models already (which already have the number of target variable classes pre-determined in the model already), then how come some of the google sites and ChatGPT and Bard is telling me the user can choose however many target variable classes that they want? ​ submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [P] Vehicles moving in wrong direction.
    I am working on a professional project which involves detecting a vehicle moving in a wrong direction on the street. Data details : I have consecutive frames of the street in which vehicles are moving. So far : I have created a model that detects the objects inside the frame and give me the coordinates(bounding-box) of those objects(vehicles). And I using the optical flow to produce the optical lines on the objects which are moving inside the consecutive frames and I am also able to get the direction(if a object is moving from top to bottom on the frame it means it is in right direction and if bottom to top in the frame it means wrong direction based on change in y-offset of the object). Now the optical lines code is different which is giving me the direction of the object(I am not using any model to detect object in this code it's based on Lucas-Kadane method) and when I say direction I mean I'm using the cv2.imshow() which actually plays the consecutive frames together and draw optical lines on it and shows me the direction visually. Now the problem is I want the coordinates of the object that is moving in wrong direction (the bounding box coordinates) how can I achieve that? Any suggestions and ideas would be helpful please mind I can't use any other technology or model as this are project requirements in professional setting) submitted by /u/Sherlock_holmes0007 [link] [comments]  ( 9 min )
    [D] How to decide number of trees in hyperparameter tuning?
    The dataset I have consists of around 2,300 observations and 120 variables, of which around 25 are highly correlated, so I narrowed it down to 95 variables. I'm using R's boost_tree() with xgboost as my model. How do I decide when to stop tuning for number of trees, mtry, min_n, and tree depth, without actually overfitting the data? Because as I increase the number of trees (or any other variable like the ones above), my RMSE obviously goes down, but how do I know it is overfitting the training data? Or is there no overfitting in this case, since I am using cross validation (15 fold) already? PS, the test data is 800 observations submitted by /u/heeeehuuuu [link] [comments]  ( 9 min )
    [D] Feature extraction in multivariate time series
    How do you usually do feature extraction for time series data? I used to work on visual domain, so I'm pretty familiar with CVs but recently I was assigned some tasks on multivariate time series data and it's been quite difficult getting used to. Major problem is, while features in vision have semantic meaning not just along temporal axis but also "spatially," the multivariate time series does not. Also, is it considered a "cheating" if i pre-extract certain features that are already established by experts to have high correlation with the result, rather than letting the machine learning algorithm learn on its own those "certain features" in some way through training? Thanks! submitted by /u/-273deg [link] [comments]  ( 9 min )
    [D] MLOps resources
    Hi, I wonder which books or courses you would recommend for intermediate and advanced MLOps/ML design systems. What I mean is topics like handling hundreds of models and their updates, reusable CI/CD pipelines, batch and online architectures, integration with feature stores, jobs/queues for model scheduling, data drift, metrics monitoring, and alerts, and so on. This would be for someone familiar with the major concepts, hands-on experience with MLflow, SageMaker, Azure ML services, Databricks and similar tools. submitted by /u/rodrigo-arenas [link] [comments]  ( 9 min )
    [D] RLHF for multi-turn conversation, Option A or B?
    I have a dataset that consists of dialogues between a user and a chatbot (ChatGPT), and I want to use this data to implement Reinforcement Learning from Human Feedback (RLHF). I have already completed Supervised Fine-Tuning (SFT) and built the reward model. Now, I need some guidance on how to handle the data. Here is an example of the pre-collected data: >> User: Give me a tip on how to succeed in drawing. >>> ChatGPT: Practice regularly and be patient with yourself. Improvement takes time. >>>User: But drawing is hard. >>>ChatGPT: It is, and that's okay. It's normal to find it challenging, especially when you're just starting out. Just try to enjoy the process and don't be too hard on yourself. ​ ==========My input Data 1 is :>> User: Give me a tip on how to succeed in drawing. Suppose my model outputs the following for Input Data 1: ChatGPT: Practice makes perfect. My question is, for Input Data 2, should I use: Option A: User: Give me a tip on how to succeed in drawing. ChatGPT: Practice makes perfect. User: But drawing is hard. In this option, I use the actual previous term's agent output and append the pre-collected user data. Or Option B: User: Give me a tip on how to succeed in drawing. ChatGPT: Practice regularly and be patient with yourself. Improvement takes time. User: But drawing is hard. In this option, I use all the pre-collected data, which might not even be the current model's output. Which option is more appropriate for RLHF, A or B? submitted by /u/No_Oilve_6577 [link] [comments]  ( 9 min )
    [D] Why did the authors design this gradient reversal layer in the paper "Unsupervised Domain Adaptation by Backpropagation"?
    I am reading the famous paper " Unsupervised Domain Adaptation by Backpropagation" again, but still got confused why the authors had to design this gradient reversal layer. To my understanding, simply adding a minus-one (-1) in front of the domain classifier head is good enough. Of course, we need to minimize the original domain classifier head at some point to make it decent. For example, if it is a two-step training like GAN, we can (1) Freeze other parts but only minimize the domain classification loss to update the domain classifier head; and then (2) Freeze the domain classification head, but maximize the domain classificatoin loss to update the feature extractor. We can alternate between (1) and (2). Is the main motivation of gradient reversal layer that we can merge (1) and (2) into a single training step? submitted by /u/AaronSpalding [link] [comments]  ( 9 min )
    [D] Dataset condensation
    Hello everyone, has anyone here read the paper "Dataset Condensation with Gradient Matching"? I've been reading it, but I got stuck trying to understand how they transition from the point where the loss is the distance between parameters to the point where the loss is the distance between gradients. Could someone please explain this process in detail? Apparently, they make the assumption that the initializations are the same and that the distance between parameters is close to zero for every iteration, but I'm still struggling to comprehend how they arrive at the conclusion that the distance is now between gradients. submitted by /u/Ok-Cartographer-1363 [link] [comments]  ( 9 min )
    "[P]" A Scientific Exploration into the Integration of Biomimicry Principles within Machine Learning Algorithms
    Hey everyone, I am excited to introduce a project that delves into the experimental fusion of Biomimicry principles with Machine Learning algorithms. While the concept of unlearning serves as our initial prototype, the overarching ambition extends far beyond, aiming to pioneer new methodologies inspired by natural phenomena. 🎯 Objective The core objective of this research is to investigate the feasibility and efficacy of incorporating biomimetic principles into machine learning algorithms. The goal is not merely to improve algorithmic performance but also to introduce novel methods that can tackle complex computational problems, much like how nature solves intricate issues in an energy-efficient manner. --- 📑 Methodological Outline **Conceptual Framework**: The project adopts a…  ( 10 min )
  • Open

    Andrew Ng doesn't think RL will grow in the next 3 years
    From his latest talk on AI, he has ever field of ML growing in market size / opportunities except for RL. Do people agree with this sentiment? Unrelated, it seems like RL nowadays is borrowing SL techniques and apply to offline datasets. submitted by /u/wardellinthehouse [link] [comments]  ( 9 min )
    Achieving 4000x Speedups with PureJaxRL
    submitted by /u/shrekkertech [link] [comments]  ( 9 min )
    Does anybody know why gym environments are opening in not secure window on my browser?
    ​ https://preview.redd.it/ina42mr5wplb1.png?width=1298&format=png&auto=webp&s=f65fe5eade9dc1f3312ff280436e6e0a5ba6e380 submitted by /u/nimageran [link] [comments]  ( 9 min )
    Question about forward-view TD compares to planning in model-based RL
    I have a confusion in difference between forward-view TD sampling and model-based RL. Assuming using approximation function. In forward-view TD (more than one step), the reward sampling is the future estimation in according to the currently policy (kind like searching the best situation). What is the different between the forward-view TD which likely to be planned by the policy (assuming greedy) and the model-based RL which planned by the model of fake environment? Does the only difference is model-based able to predict the result of action in 1-2-3 step in the future (in agent's head) from the transition model where model-free rely on the approx. function? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
    suggestion for AI tools (chat style) that run on-prem and allow for vectorDB input?
    Hi I'm looking to run an on-prem ChatGPT style LLM that can ingest private customer data via a VectorDB. So far I have tried three... GPT4All - limited to only allows for up to 13b parameter LLMs and only on CPUs (currently), also its 'localdocs' implementation I've found to only reference its docs very infrequently when answering. H2OGPT - it's implementation of 'localdocs' (I believe via LangChain) seems pretty good. but seems like every time I run an instance, I would have to re-vector my documents. Not sure if there is a way to attach an VectorDB to it so it's ready to go right away. PrivateGPT - seems to work very well, currently it's only running on CPUs thus response time is over a minute. Curious if the community knows of any other products that do this and are already GPU accelerated. ​ TY in advance. ​ ​ submitted by /u/konrad21 [link] [comments]  ( 9 min )
    Significant improvements for multi-agent reinforcement learning!
    We've just released a new version of our evolutionary hyperparameter optimization RL framework, which is 10x faster than SOTA! This update is focused on multi-agent RL. We've introduced MADDPG and MATD3 to the framework. These algorithms are traditionally super brittle, and RLlib even recommends not to use their own implementation of it. However, our evolutionary framework has solved this problem! You can now train multiple agents in co-operative or competitive Petting Zoo-style (parallel API) environments, with significantly faster training and up to 4x improvement in total return when benchmarked against alternatives. Please check it out! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    OpenLLaMA2 - A DeepSpeed/Ray based LLaMA2 RLHF framework
    https://github.com/OpenLLMAI/OpenLLaMA2 We can fully fine-tune a 7B llama2 RLHF model with a Single A100 GPU. We are also developing a Ray-based RLHF framework for > 30B models. We provide PPO hyperparameters out of the box. Welcome to participate in the development and star OpenLLaMA2. ​ Some PPO vs. SFT examples. https://github.com/OpenLLMAI/OpenLLaMA2/blob/main/docs/ppo_examples.md SFT Human: Please introduce the game GTA5 Assistant: GTA5, or Grand Theft Auto V, is an action-adventure video game developed by Rockstar North and published by Rockstar Games. It was released in 2013 for the PlayStation 3 and Xbox 360, and later in 2014 for the PlayStation 4 and Xbox One. The game is set in the fictional city of Los Santos, based on Los Angeles, and follows the story of three crimi…  ( 10 min )
    RLHF, option A or B, plz help me choose
    I have a dataset that consists of dialogues between a user and a chatbot (ChatGPT), and I want to use this data to implement Reinforcement Learning from Human Feedback (RLHF). I have already completed Supervised Fine-Tuning (SFT) and built the reward model. Now, I need some guidance on how to handle the data. Here is an example of the pre-collected data: >> User: Give me a tip on how to succeed in drawing. >>> ChatGPT: Practice regularly and be patient with yourself. Improvement takes time. >>>User: But drawing is hard. >>>ChatGPT: It is, and that's okay. It's normal to find it challenging, especially when you're just starting out. Just try to enjoy the process and don't be too hard on yourself. ​ ==========My input Data 1 is :>> User: Give me a tip on how to succeed in drawing. Suppose my model outputs the following for Input Data 1: ChatGPT: Practice makes perfect. My question is, for Input Data 2, should I use: Option A: User: Give me a tip on how to succeed in drawing. ChatGPT: Practice makes perfect. User: But drawing is hard. In this option, I use the actual previous term's agent output and append the pre-collected user data. Or Option B: User: Give me a tip on how to succeed in drawing. ChatGPT: Practice regularly and be patient with yourself. Improvement takes time. User: But drawing is hard. In this option, I use all the pre-collected data, which might not even be the current model's output. Which option is more appropriate for RLHF, A or B? submitted by /u/No_Oilve_6577 [link] [comments]  ( 10 min )
  • Open

    Does anyone know if an AI can help me?
    My friend has a picture of herself from a while ago with a fake tattoo. She had the tattoo made from an original image, but she doesn't have it anymore. Is there an AI that could take the tattoo from the picture that is on her body and make it into a 2d version that can be made into a tattoo guide? submitted by /u/StitchTheFox [link] [comments]  ( 9 min )
    AI System Can Predict Chemical Smells Based on Molecular Structures
    A new study cites the creation of an AI system that can predict how a specific compound will smell by analyzing its molecular structure. You can check it out here. If you want to stay on top of the latest trends and insights in AI, look here first. Why is this significant? The AI system, developed by researchers at startup Osmo, can utilize 55 descriptive words to assign a smell or 'aroma' to a chemical compound or 'odorant'. This breakthrough might be utilized to enhance the food and cleaning product industries where synthetic scents play an essential role. What’s next for this AI system? The AI's predictions often aligned closer with human consensus than any individual guess, indicating its robustness and potential. The next step for this research is to comprehend how different odorants mix and compete to yield a smell that the human brain identifies as unique. However, the sheer number of combinations, even with a small set of odorants, poses a daunting task. To quote Stuart Firestein, a neurobiologist at Columbia University, “Predicting what a mix smells like is the next frontier.” P.S. If you like this kind of analysis, you’ll love my free newsletter that tracks the most relevant news and research in AI and tech. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com Researchers introduce ‘Swift’, the first autonomous vision-based drone that beat human world champions in several fair head-to-head races. This marks the first time that an autonomous mobile robot has beaten human champions in a real physical sport [Details]. Generative AI updates from Google Cloud Next event: General availability of Duet AI in Google Workspace [Details]. SynthID - a tool for watermarking and identifying AI images generated by Imagen (Google’s text-to-image diffusion model). It embeds a digital watermark directly into the pixels of an image, making it invisible to the human eye, but detectable for identification, without reducing the image quality [Details]. AlloyDB AI for building generative AI applications with PostgreSQL [Details]. …  ( 11 min )
    AI-powered hate speech detection will moderate voice chat in Call of Duty
    submitted by /u/SAT0725 [link] [comments]  ( 9 min )
    Generative AI could potentially automate up to 75 million global jobs, ILO Study Finds
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    How to generate movies using gen AI/prompts?
    I bet there’s a genius research team out there that started work on this How cool/crazy would that be? submitted by /u/AILaunchpad [link] [comments]  ( 9 min )
    Odd Bing conversation turn
    This happened. Was NOT aware the already extensive and tiresome limitation in discussion subjects was THIS pervasive, and frankly, this fragile egoed. Really? THIS is "controversial?" submitted by /u/HotaruZoku [link] [comments]  ( 9 min )
    TinyTap rolls out new AI features for educators and parents
    submitted by /u/baillyjonthon [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/31/2023
    Forget smartwatches, Microsoft may make a backpack with an AI assistant.[1] Call of Duty will use AI to moderate voice chats.[2] OpenAI Introduces Special Tutor Prompts To Implement ChatGPT In Classrooms.[3] Google Meet’s new AI will be able to go to meetings for you.[4] Sources: [1] https://www.windowscentral.com/software-apps/forget-smartwatches-microsoft-may-make-a-backpack-with-an-ai-assistant [2] https://www.theverge.com/2023/8/30/23852652/call-of-duty-activision-modulate-toxmod-artificial-intelligence-voice-moderation [3] https://robots.net/news/openai-introduces-special-tutor-prompts-to-implement-chatgpt-in-classrooms/ [4] https://www.theverge.com/2023/8/29/23849056/google-meet-ai-duet-attend-for-me submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    We will take our symbiosis with animals to the next level.
    submitted by /u/kipaxbooks [link] [comments]  ( 9 min )
  • Open

    How to Save a Neural Network Model in Python Tensorflow?
    submitted by /u/aheadMake57 [link] [comments]  ( 9 min )
  • Open

    Elevating the generative AI experience: Introducing streaming support in Amazon SageMaker hosting
    We’re excited to announce the availability of response streaming through Amazon SageMaker real-time inference. Now you can continuously stream inference responses back to the client when using SageMaker real-time inference to help you build interactive experiences for generative AI applications such as chatbots, virtual assistants, and music generators. With this new feature, you can start streaming the responses immediately when they’re available instead of waiting for the entire response to be generated. This lowers the time-to-first-byte for your generative AI applications. In this post, we’ll show how to build a streaming web application using SageMaker real-time endpoints with the new response streaming feature for an interactive chat use case. We use Streamlit for the sample demo application UI.  ( 12 min )
    FMOps/LLMOps: Operationalize generative AI and differences with MLOps
    Nowadays, the majority of our customers is excited about large language models (LLMs) and thinking how generative AI could transform their business. However, bringing such solutions and models to the business-as-usual operations is not an easy task. In this post, we discuss how to operationalize generative AI applications using MLOps principles leading to foundation model operations (FMOps). Furthermore, we deep dive on the most common generative AI use case of text-to-text applications and LLM operations (LLMOps), a subset of FMOps. The following figure illustrates the topics we discuss.  ( 23 min )
  • Open

    Fast-tracking fusion energy’s arrival with AI and accessibility
    MIT Plasma Science and Fusion Center will receive DoE support to improve access to fusion data and increase workforce diversity.  ( 8 min )

  • Open

    Thoughts on ZTM? [D]
    Thoughts on the Zero to Mastery programs? There is a machine learning bootcamp course on Udemy that is part of that program. I feel like i've heard negative reviews about them in the past, but it's only 12.99 right now and I feel like it covers a lot of content. So I guess I'm just wondering if it's really that bad, or if the course would be worth my time? Would it really take me from "Zero to Mastery"? Thanks submitted by /u/Mountain-Economy1476 [link] [comments]  ( 9 min )
    Math for ML Course on Udemy [D]
    Are there any good math for machine learning courses on Udemy? I specifically want a course that offers lots of exercises so I am able to practice what I learn. Thanks submitted by /u/Mountain-Economy1476 [link] [comments]  ( 9 min )
    "[D]" A Scientific Exploration into the Integration of Biomimicry Principles within Machine Learning Algorithms
    Hey everyone, I am excited to introduce a project that delves into the experimental fusion of Biomimicry principles with Machine Learning algorithms. While the concept of unlearning serves as our initial prototype, the overarching ambition extends far beyond, aiming to pioneer new methodologies inspired by natural phenomena. --- 🎯 **Objective** The core objective of this research is to investigate the feasibility and efficacy of incorporating biomimetic principles into machine learning algorithms. The goal is not merely to improve algorithmic performance but also to introduce novel methods that can tackle complex computational problems, much like how nature solves intricate issues in an energy-efficient manner. --- 📑 **Methodological Outline** **Conceptual Framework**: The proje…  ( 10 min )
    [P] We embedded all SEC and Press Releases data for US companies, it is available for retrieval
    Retrieval augmented generation (RAG) is one of the most popular way to add additional knowledge to your LLMs. To do RAG well, you need to do three things well - Curate high quality datasets Create abstractions (embeddings, keyword indexes, knowledge graphs) Stitch everything together for better retrieval We have realized that it is even harder than what it looks like. We want to easily enable this infra for a range of datasets, starting with company-specific data. You can give it a go here on our playground or get started with our open sourced library submitted by /u/achyutjoshi [link] [comments]  ( 9 min )
    [D] Anyone submitted to CPAL?
    There was a paper submission deadline for Conference on Parsimony and Learning (CPAL) earlier this week. This is their first conference so I expect the number of submissions to be very small, but has anyone submitted? I am guessing they received like 100 or 200ish submissions. submitted by /u/neurogramer [link] [comments]  ( 9 min )
    [D] Best frameworks and tools to design ml based web applications
    As the title says, I'm looking for a list of the best tools and framework to learn, useful for build machine learning solution as web application. I want to move my projects from being jupyter notebooks using tensorflow or pytorch, to ml API and applications. submitted by /u/AcquaFisc [link] [comments]  ( 9 min )
    [D] How do you track what you learnt from the papers?
    It has always been a struggle for me. I tried to take notes as I read paper, but that’s not quite sustainable because it’s difficult to track where did the notes come from for more details. Or I highlight the sections with added comments but that’s also not quite accessible when you have tones of pdf lying around somewhere or worse print outs. Recently I’ve been trying a cloud based pdf reader that stores my papers and allow searches over all highlights and comments (Pond) Thinking if I could also use it to share papers with my colleagues but I’m not sure if it will work because that will require them to use it as well. How do you solve this ? submitted by /u/dockerun [link] [comments]  ( 9 min )
    [D] need dataset for my research project
    I am working on a project for my research but need a dataset which contains the generation and consumption of electricity for Micro Hydro Power station, anyone could help me. I will be grateful submitted by /u/Due-Draft6855 [link] [comments]  ( 9 min )
    [N] Supporting the Open Source AI Community
    https://a16z.com/2023/08/30/supporting-the-open-source-ai-community/ From the text: We believe artificial intelligence has the power to save the world—and that a thriving open source ecosystem is essential to building this future. Thankfully, the open source ecosystem is starting to develop, and we are now seeing open source models that rival closed-source alternatives. Hundreds of small teams and individuals are also working to make these models more useful, accessible, and performant. These projects push the state of the art in open source AI and help provide a more robust and comprehensive understanding of the technology. They include: instruction-tuning base LLMs; removing censorship from LLM outputs; optimizing models for low-powered machines; building novel tooling for model inference; researching LLM security issues; and many others. However, the people behind these projects often don’t have the resources available to pursue their work to conclusion or maintain it in the long run. The situation is more acute in AI than traditional infrastructure, since even fine-tuning models requires significant GPU computing resources, especially as open source models get larger. ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models - University of Illinois 2023
    Paper: https://arxiv.org/abs/2308.16137 Abstract: In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs are deployed for increasingly complex tasks, they often face the needs to conduct longer reasoning processes or understanding larger contexts. In these situations, the length generalization failure of LLMs on long sequences become more prominent. Most pre-training schemes truncate training sequences to a fixed length (such as 2048 for LLaMa). LLMs often struggle to generate fluent texts, let alone carry out downstream tasks, after longer contexts, even with relative positional encoding which is designed to cope with this problem. Common solutions such as finetuning on longer …  ( 9 min )
    [R] CoTracker: A Revolutionary 2D Point Video Tracker
    CoTracker - a 2D point-tracking tool for videos - promises to revolutionize motion tracking. Through the use of a transformer network, it meticulously predicts point trajectories and visibility across video frames, giving insights like never before. https://i.redd.it/g0u5t9n1ehlb1.gif Here's why CoTracker is turning heads: CoTracker leverages advanced transformer formulation: Utilising a grid of input tokens that evolve to output tokens, CoTracker allocates initial values derived from the track's start point and time. It's built to handle extended videos through 'windowed inference': Windowing enables the algorithm to handle videos beyond its maximum window length by splitting them into overlapping segments. 'Unrolled Learning' caters to semi-overlapping windows effectively: By employing two unique types of losses, only a modest amount of windows are used in loss computation while still handling expansive videos at test time. Improved tracking through simultaneous multi-point selection: By tracking multiple points at once, CoTracker is able to better establish correlation and motion paths within videos. Despite its notable strengths, there are limitations. Its sliding-window approach cannot handle long-term occlusions that last longer than a window, and its transformer-based model has a high computational cost that grows quadratically with the number of tracked points. According to the authors, “The result is a flexible and powerful tracking algorithm that outperforms state-of-the-art methods in almost all benchmarks”. But it’s yet to be seen how it will perform in real-life tasks. What do you think? P.S. If you like this type of analysis, you might want to check this out. (arXiv) (GitHub) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [Project] Combining Prompt Engineering with Structured Inputs to LLMs to Generate Insights on Predictions from Binary Classifiers
    All of this is framed through the lens of improving your ability to understand if a prediction for an upcoming UFC match is good or not. Happy to dig further into the ML and processing around this. https://blog.wolftickets.ai/teaching-a-wolf-to-speak-transforming-fight-predictions-into-insights.html Any feedback is appreciated! submitted by /u/wolfticketsai [link] [comments]  ( 9 min )
    [P] Autolabel: data labeling with LLMs
    Hi everyone, Wanted to share an open source project we've been working on for the last few weeks: Autolabel is an open source Python library to label and enrich text datasets with LLMs (Large Language Models). Why? Access to clean, labeled data is a huge bottleneck for most ML/data science teams. From experiments across a variety of NLP tasks and datasets, we have found that the most capable LLMs are able to label data at better quality than human annotators, but 20-100x faster. Getting Started¶ You can get started with the library by defining a JSON config, and writing a few lines of code: from autolabel import LabelingAgent, AutolabelDataset agent = LabelingAgent('config.json') dataset = AutolabelDataset('dataset.csv', 'config.json') labels = agent.run(dataset) Installation guide Sample notebooks that show how to use the library for different labeling tasks. Technical report for benchmarking LLM and human annotator performance across a range of tasks and datasets. Call for Feedback We just open sourced this library, and are actively developing it. Feedback is very welcome and so are requests for features. You can open an issue on Github for bugs and request features submitted by /u/nihit-d [link] [comments]  ( 9 min )
    [D] Notation problem of equation 1 from the paper Axiomatic attribution for deep networks?
    In Equation 1 of the paper "Axiomatic Attribution for Deep Networks", the denominator of the gradient is $\partial x_i$ (See Eq1). However, according to the paper(with Eq2), shouldn't it be $\partial (x'_i + \alpha \times (x_i-x'_i))$ rather than $\partial x_i$? I found many following papers which refer to this paper also use the notation like this. Do I misunderstand something? submitted by /u/qjall [link] [comments]  ( 9 min )
    [Discussion] Leveraging Leaky Softplus Activation with Momentum-Based Optimizers like Adam for Efficient Neural Network Training
    In the realm of deep learning, the choice of activation functions and optimization algorithms can significantly impact the training process and the performance of neural networks. A relatively lesser-known gem in this landscape is the "leaky softplus" activation function, which, when paired with momentum-based optimizers like Adam, can lead to exceptionally efficient and effective training outcomes. The Leaky Softplus Activation Function The leaky softplus activation function combines the benefits of both linearity and non-linearity in a graceful manner. Defined as Math.Log(Math.Exp(x) + 1) + (x / 16), it smoothly transitions between a nearly linear response for negative inputs and a more pronounced non-linear response for positive inputs. This unique characteristic enables it to address…  ( 10 min )
    [P] DeepEval - Neural Framework For Testing LLMs
    Hi everyone, I built DeepEval - an open-source unit testing framework for LLMs in order to accelerate development and iteration. The problem When designing software applications, testing has always been critical for a lot of production applications. However - with the rise of LLM applications, the type of testing required needs to change in order to adapt for the large number of possible queries. We therefore built DeepEval in order to make it easy to write LLM tests in just 1 line of code. We hope this solution is of value to future teams when iterating on their RAG pipelines, migrating LLM models, testing their fine-tuned LLMs. The solution The DeepEval framework is as follows: We split up testing LLMs into 4 main sections: - Answer Relevancy (how relevant an answer is to a question) - measured using a question-answer bi-encoder. - Factual consistency (whether the generated answer is hallucinating) - measured using entailment from an NLI model - Conceptual similarity (when given a ground truth, how closely does it relate to it - for example How big is it? The size of an orange vs 20 square centimetres.) - measured using vector similarity - Bias, Toxic classification (measured through DL classifier models) I would love any feedback on what we are building here and welcome any OS contributions! submitted by /u/ConfectionSafe954 [link] [comments]  ( 9 min )
    [D] Training models when you have limited compute power
    I've been wanting to take a code chatbot model like starchat or codellama and tune it to our codebase, problem is all I have at work is a Mac with 8gb of RAM. I talk with my boss today and can ask for some stuff if I want and can give good reason. What's the most efficient way to get the compute I need to train the model. Any other advice on how to go about doing this is greatly appreciated submitted by /u/Kechup17 [link] [comments]  ( 9 min )
    [N] DINOv2 is now available under the Apache 2.0 license
    Meta AI has made their DINOv2 self-supervised learning method for training computer vision models truly open source by publishing it under Apache 2.0 license. DINOv2 has outperformed previous state-of-the-art self-supervised learning methods on a variety of computer vision tasks, including image classification, object detection, and semantic segmentation. It is also more efficient to train than previous methods, making it more accessible to researchers and practitioners. DINOv2 is different from existing methods because it provides a new way to train high-performance computer vision models without the need for labeled data. This makes it possible to train models on large datasets of unlabeled images, which can be more cost-effective and time-efficient than collecting and labeling large datasets of images. New demo: https://dinov2.metademolab.com/ submitted by /u/noiseinvacuum [link] [comments]  ( 9 min )
    [P] Deep reinforcement learning library to import multiple URDF robots and objects ?
    I have experience in deep learning but am a beginner in using deep reinforcement learning for robotics. However, I have recently gone through the huggingface course on deep reinforcement learning. I tried tinkering around with panda-gym but am having trouble trying to start my own project. I am trying to use two UR5 robots do some bimanual manipulation tasks e.g. have the left arm hold onto a cup while the right pours water into it. panda-gym allows me to import a URDF file of my own robot but I can't find the option to import my own objects like the xml file (or any extension) of a table or a water bottle. I have no idea which library allows me to import multiple URDF robots and xml objects and was hoping for some help. EDIT : I actually just read about Gazebo and was wondering if it'll allow me to do the above ? As a beginner I still have zero experience with ros and gazebo. submitted by /u/I_am_a_robot_ [link] [comments]  ( 9 min )
    [D] Optimizing simple distributions for something other than maximum likelihood
    As everyone knows, we usually optimise for maximum likelihood when fitting distributions like gaussians (equivalent to the forward KL-divergence). But for neural networks, techniques like GANs allow the minimisation of other distances like Mutual Information or Reverse KL. While this is certainly a very cool and insightful approach, it's also highly complex. I wonder wether other approaches to this problem exist for the simpler case, like fitting a gaussian or some other analytic distribution. From statistics, I have only encountered maximum likelihood and it's variations like robust statistics. submitted by /u/LeanderKu [link] [comments]  ( 9 min )
    [P] I created a package implementing a SOTA technique for XAI ( Explainable AI)
    This is the package https://github.com/mfumagalli68/xi-method Follow the README and install directly from pypi. From the paper: " [..]To bridge this gap we propose a family of measures of statistical association whose definition is well-posed also for nonordered data. Our intuition is to rely on separation measurements between probability mass functions. Here, by separation measurement we mean any distance or divergence between probability mass functions that is positive, and that is null if and only if the probability mass functions coincide. Then, we show that the new class of sensitivity indices complies with Renyi’s postulate D of measures of statistical dependence (Renyi, 1959). This postulate, called zero-independence property in the following, requires that a measure of associat…  ( 10 min )
  • Open

    "What Are Dreams For?" (twitching in fetal dreaming suggests dreams are offline RL for learning motor control)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    DQN can't solve frozen lake environment
    Hello all, I am trying to solve the frozen lake environment using DQN. And I see two issues. One is that the loss falls down to zeros and second the agent only reaches the goal only 5 times in 1000 epochs. Here's my code. import numpy as np import tensorflow as tf from tensorflow.keras import layers, activations import matplotlib.pyplot as plt import gym def create_agent(num_inputs, num_outputs, layer1, layer2): inputs = layers.Input(shape=(num_inputs, )) hidden1 = layers.Dense(layer1)(inputs) activation1 = activations.relu(hidden1) hidden2 = layers.Dense(layer2)(activation1) activation2 = activations.relu(hidden2) outputs = layers.Dense(num_outputs, activation='linear')(activation2) model = tf.keras.Model(inputs, outputs) return model loss_mse = tf.keras.losses.MeanSquaredError() lear…  ( 10 min )
    "Echo Chess: The Quest for Solvability" (level design preference learning: predicting high-quality soluble mazes using human feedback from quitting rates)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    [P] Library to import multiple URDF robots and objects ?
    I have experience in deep learning but am a beginner in using deep reinforcement learning for robotics. However, I have recently gone through the huggingface course on deep reinforcement learning. I tried tinkering around with panda-gym but am having trouble trying to start my own project. I am trying to use two UR5 robots do some bimanual manipulation tasks e.g. have the left arm hold onto a cup while the right pours water into it. panda-gym allows me to import a URDF file of my own robot but I can't find the option to import my own objects like the xml file (or any extension) of a table or a water bottle. I have no idea which library allows me to import multiple URDF robots and xml objects and was hoping for some help. submitted by /u/I_am_a_robot_ [link] [comments]  ( 9 min )
    Mini-Batch in PPO
    Hi I am struggling to understand the mini-batch in PPO. Say I already collected two trajectories. Traj_A = [t = 1, t= 2, t=3 ,.... t = 100] Traj_B = [t =1 , t=2, ... t= 78] Now, I heard you usually break this down onto mini-batch (say a batchsize of 6). Do you do random sampling? eg, one batch is [Traj_A_t=1, Traj_A_t=2, Traj_A_t=100, Traj_A_t=66, Traj_A_t=77, Traj_A_t=55]??? OR do you need to maintain some sequence [Traj_A_t=1, Traj_A_t=2, Traj_A_t=3, Traj_A_t=4, Traj_A_t=5, Traj_A_t=6]??? submitted by /u/No_Oilve_6577 [link] [comments]  ( 9 min )
  • Open

    every time i talk to llama 2 it sounds like its scared of getting punished
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Breaking: US expands export restrictions on Nvidia AI chips to Middle East
    The US government has imposed expanded export restrictions affecting Nvidia’s leading artificial intelligence chips, curbing their exportation beyond China to certain Middle Eastern countries. If you want to stay on top of AI advances, look here first. https://preview.redd.it/xe7ho00t0ilb1.png?width=1240&format=png&auto=webp&s=61225931bf3e316efd90eab83846402d4148aca2 Why this matters: Nvidia’s A100 and H100 chips are affected: These AI chips are important and used to accelerate machine-learning tasks on major AI applications like ChatGPT. Despite the restrictions, Nvidia maintains they won’t have an “immediate material impact” on its results. Other companies, like AMD, are also affected: They’ve reportedly received similar restrictions notice, hinting at a broader move by the US government to control the distribution of AI chip technology. The move is part of a larger geopolitical play: These restrictions form part of the Biden administration’s efforts to curtail Beijing’s ability to capitalize on the AI revolution. How Nvidia and the industry might respond: Nvidia CEO Jensen Huang has cautioned the US: In a Financial Times interview, Huang warned that imposing such restrictions could lead to “enormous damage” to the US tech industry, predicting China may become self-sufficient in AI chip development. Yet, Nvidia still managed impressive earnings recently: Despite these challenges, Nvidia recently reported quarterly revenue of $13.5bn, exceeding predictions by $2bn. Further restrictions could significantly alter the landscape for AI development, potentially fostering greater innovation in countries affected or even a race to develop independent solutions. P.S. If you like this kind of analysis, you might want to check this out. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Baidu publicly releases their AI chatbot Ernie Bot
    In a bid to rival the United States’ stronghold in the AI industry, Chinese search engine and AI firm Baidu, has made its ChatGPT-equivalent language model, Ernie Bot, fully available to the public. This marks a significant move on the AI chessboard. If you want to stay on top of everything AI, look here first. https://preview.redd.it/g68sr07iihlb1.jpg?width=1024&format=pjpg&auto=webp&s=c0c873badd448257bcc2fb125188acc198e504d6 Why does this matter? Baidu's public release of Ernie Bot signals the company's aggressive push in the generative AI market. By opening up its model to the public, Baidu can leverage expansive real-world human feedback to improve Ernie Bot. China's determination to lead the AI industry is unabated, with many tech firms launching their own generative models in response to OpenAI's popular ChatGPT. Baidu's move further fuels this rivalry. Regulation in China seems to support such AI advancements. CEO Robin Li voiced his optimism about the AI regulations—calling them "more pro-innovation than regulation". What's the broader response? Baidu's latest stride has boosted its stock price by over 3%, underlining the market's high anticipation of Baidu's AI efforts. Ernie Bot has rocketed to the top of Apple's iOS free app chart in China. This demonstrates a positive initial response from the public. Regulation is key in China's AI game: China has stringent regulations for the generative AI industry, requiring a security review and government approvals before any product launch. Moreover, companies need to comply with governmental tech and data requests. The US, on the other hand, doesn't currently have such regulations in place. A markedly different approach that could significantly influence the development and application of AI technologies. If you like this kind of analysis, you might want to check this out. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI Seach
    I'm looking for a AI Search tool that replaces the search bar on a website. Search tool will scrape that sites data and offer suggestions. Any recommendations? submitted by /u/CauliflowerTiny1454 [link] [comments]  ( 9 min )
    SmartGPT: Major Benchmark Broken - 89.0% on MMLU + Exam's Many Errors
    submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    Help Me Understand ChatGPT
    I'm currently researching how users interact with ChatGPT and its features, and I'd really appreciate your insights, experience, and perspective. Why should you participate? It's a quick 5-minute survey. Your identity and responses are completely anonymous. Your input will significantly contribute to important research on ChatGPT. The final research document will be posted to this sub. Survey Link: https://forms.gle/tNBib2dA1ErFEwbk6 Rest assured, all information will be confidential and only used for the purpose of this research. Thank you for your time submitted by /u/aaron-cesaro [link] [comments]  ( 9 min )
    Best AI to bypass Ai detection for essays and assignment
    So yeah it's an open book course, but I'm horrible at flow and grammar. I need to be able to fix these things without getting in trouble. Ten years ago in my undergrad friends and family would do the final proofreading for me to make small changes. Is undetectable reputable. submitted by /u/6ixsideOT [link] [comments]  ( 9 min )
    Chat with your favorite characters from movies, TV shows, books, history, and more (+ Discord bot)
    ​ ChatFAI characters Hey everyone, ChatFAI has a special connection with this community because this is where I got it started. It was a simple web app that allowed you to interact with your favorite characters from movies, TV shows, books, history, and beyond. Now, it is a lot more. It has public APIs and an official Discord bot integration now. A lot of performance improvements have been made in the recent days. People have created a lot of characters (https://chatfai.com/characters) The Discord bot is still a new area so could you share feedback if you guys check it out? You can also find it in the Discord app directory. submitted by /u/usamaejazch [link] [comments]  ( 9 min )
    AI-powered drone beats human champion pilots | "Swift AI used technique called deep reinforcement learning to win 15 out of 25 races against world champions"
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/30/2023
    Tesla is about to flip the switch on its $300 million new AI cluster, featuring 10,000 Nvidia H100 compute GPUs.[1] Intel has revealed two new Intel Xeon processors this week at Hot Chips 2023 to give designers new options for efficient server-level performance.[2] General Motors is using conversational AI chatbots to handle simple OnStar calls, freeing up the service’s human employees to address more complex requests, the company said Tuesday.[3] Microsoft announces Turing Bletchley v3 vision-language model for Bing image searches.[4] Sources: [1] https://www.tomshardware.com/news/teslas-dollar300-million-ai-cluster-is-going-live-today [2] https://www.allaboutcircuits.com/news/intel-reveals-two-new-xeon-processor-lines-at-hot-chips-2023/ [3] https://www.theverge.com/2023/8/29/23849390/gm-google-cloud-ai-chat-bot-onstar [4] https://www.neowin.net/news/microsoft-announces-turing-bletchley-v3-vision-language-model-for-bing-image-searches/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    What type of model(s) do you think Spotify are using for their DJ feature to seamlessly transition every song? It’s not as easy as just crossfading for x seconds, every one is beat-matched quite literally like a real DJ.
    submitted by /u/sardoa11 [link] [comments]  ( 9 min )
    US Copyright Office seeks public input on AI and copyright
    The US government is taking steps to address the complex and controversial issues around AI and intellectual property rights. The US Copyright Office is opening a public comment period on August 30th to hear from various stakeholders on the topic. Important Details: The agency is asking for comments on three main questions: How should AI be defined and categorized for the purposes of copyright? What are the implications of AI for the rights of authors and owners of works? What are the implications of AI for the liability and responsibility of users and distributors of works? The agency also wants to hear about related issues, such as: how AI may affect publicity rights and unfair competition laws. The agency notes that AI may create works that mimic or impersonate the voices, likenesses, or styles of real people, which could raise ethical and legal concerns. Finally, they want to determine how AI may affect moral rights and cultural heritage: The agency acknowledges that AI may create works that are derivative or transformative of existing works, which could affect the reputation and integrity of the original creators and their communities. The deadline to submit your comments is October 18th and specific instructions for submitting comments are available on the Copyright Office website. P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech—stay updated in under 3 mins/day. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Can you tell it’s artificial?
    I was playing around with the Eleven labs V2 multilingual model and I have to say it’s extremely impressive. Does this sound like the real Tucker? submitted by /u/Exitium_Maximus [link] [comments]  ( 9 min )
  • Open

    Sonnets are square
    In his book How to Read Literature Like a Professor, Thomas Foster says that if a poem looks like a square on the printed page, it’s likely a sonnet. The miracle of the sonnet, you see, is that it is fourteen lines long and written almost always in iambic pentameter. … suffice it to say […] Sonnets are square first appeared on John D. Cook.  ( 4 min )
  • Open

    Use Amazon SageMaker Model Cards sharing to improve model governance
    One of the tools available as part of the ML governance is Amazon SageMaker Model Cards, which has the capability to create a single source of truth for model information by centralizing and standardizing documentation throughout the model lifecycle. SageMaker model cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a fact sheet of the model that is important for model governance.  ( 10 min )
    Use Amazon SageMaker Model Card sharing to improve model governance
    One of the tools available as part of the ML governance is Amazon SageMaker Model Cards, which has the capability to create a single source of truth for model information by centralizing and standardizing documentation throughout the model lifecycle. SageMaker model cards enable you to standardize how models are documented, thereby achieving visibility into the lifecycle of a model, from designing, building, training, and evaluation. Model cards are intended to be a single source of truth for business and technical metadata about the model that can reliably be used for auditing and documentation purposes. They provide a fact sheet of the model that is important for model governance.  ( 10 min )
  • Open

    WeatherBench 2: A benchmark for the next generation of data-driven weather models
    Posted by Stephan Rasp, Research Scientist, and Carla Bromberg, Program Lead, Google Research In 1950, weather forecasting started its digital revolution when researchers used the first programmable, general-purpose computer ENIAC to solve mathematical equations describing how weather evolves. In the more than 70 years since, continuous advancements in computing power and improvements to the model formulations have led to steady gains in weather forecast skill: a 7-day forecast today is about as accurate as a 5-day forecast in 2000 and a 3-day forecast in 1980. While improving forecast accuracy at the pace of approximately one day per decade may not seem like a big deal, every day improved is important in far reaching use cases, such as for logistics planning, disaster management, agr…  ( 93 min )
  • Open

    Meet Five Generative AI Innovators in Africa and the Middle East
    Entrepreneurs are cultivating generative AI from the west coast of Africa to the eastern edge of the Arabian Desert. Gen AI is the latest of the big plans Kofi Genfi and Nii Osae have been hatching since they met 15 years ago in high school in Accra, Ghana’s capital that sits on the Gulf of Read article >  ( 7 min )
    Morphobots for Mars: Caltech Develops All-Terrain Robot as Candidate for NASA Mission
    Academics Mory Gharib and Alireza Ramezani in 2020 were spitballing a transforming robot that is now getting a shot at work that’s literally out of this world: NASA Mars Rover missions. Caltech has unveiled its multi-talented robot that can fly, drive, walk and do eight permutations of motions through a combination of its skills. They Read article >  ( 6 min )
    GeForce NOW Gets Wild, With ‘Party Animals’ Leading 24 New Games in September
    Just like that, summer falls into September, and some of the most anticipated games of the year, like the Cyberpunk 2077: Phantom Liberty expansion, PAYDAY 3 and Party Animals, are dropping into the GeForce NOW library at launch this month. They’re part of 24 new games hitting the cloud gaming service in September. And the Read article >  ( 8 min )
  • Open

    AI Frontiers: AI in India and beyond with Sriram Rajamani
    In this episode of the Microsoft Research Podcast, Managing Director of Microsoft Research India Sriram Rajamani discusses how generative AI is impacting the lab’s approach to research and how the country’s many languages can help advance conversational systems. The post AI Frontiers: AI in India and beyond with Sriram Rajamani appeared first on Microsoft Research.  ( 30 min )
  • Open

    4 data compliance standards to know for 2023
    Data is crucial in most industries today. As the amount of business information grows, so do the standards for people’s protection of their personal information. With advanced cyberattacks, security compliance frameworks and cybersecurity have become essential fields to ensure data is collected, organized, stored, and managed in a safe way. This article will start by… Read More »4 data compliance standards to know for 2023 The post 4 data compliance standards to know for 2023 appeared first on Data Science Central.  ( 24 min )
    How the LDMs in knowledge graphs can complement LLMs
    Large language models (LLMs) fit parameters (features in data topography) to a particular dataset, such as text scraped off the web and conformed to a training set.  Logical data models (LDMs), by contrast, model what becomes shared within entire systems. They bring together the data in a system with the help of various kinds of… Read More »How the LDMs in knowledge graphs can complement LLMs The post How the LDMs in knowledge graphs can complement LLMs appeared first on Data Science Central.  ( 21 min )
  • Open

    In-Datacenter Performance Analysis of a Tensor Processing Unit
    submitted by /u/recklessdesuka [link] [comments]  ( 9 min )

  • Open

    What is your favorite AI website for research?
    I work in science research and want to introduce new tools to my students. We are looking for AI that can read tables, charts, figures, and spreadsheets, and possibly run statistics on this information. We are also looking for AI that can be given a prompt and will write on chosen topic with proper citation of sources. This information will not be used for publication, but rather, to organize main ideas and provide examples. An art AI that can draw or mimic images of real insects would be nice as well. Preferably these will all be free to use. submitted by /u/wolfmonarchyhq [link] [comments]  ( 9 min )
    Can You Solve a Time-Traveling Puzzle Designed by GPT-4? Win Bitcoin (100$) & Save the Future!
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    Shifting order in multiple-choice questions massively affects LLM performance
    Recent research proposes that Large Language Models (LLMs) may not be as reliable as we think. In fact, the order of options in a multiple-choice question drastically influences the responses from LLMs such as GPT-4 and InstructGPT. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/dxfsq72kzalb1.png?width=1289&format=png&auto=webp&s=e4ed5b541073bde18d2865f2c15e8028388070f5 What are the findings? LLM sensitivity to multiple-choice arrangement: The study suggests if options in multiple-choice questions are reordered, the LLM's performance varies dramatically— approximately 13% to 75% depending on the benchmark. Positional bias shapes responses: When the LLM is uncertain between top-selected answers, the option positioning can artificially lean its predictions. Observations also found that LLMs favor specific placements when unsure of the optimal response among top-selected answers. Performance improves when calibration techniques are applied: Making use of two unique calibration methods, the performance of LLMS saw up to eight percentage points of increase across numerous models and benchmarks. Why does this matter? This moves us closer to identifying the factors contributing to LLMs' sensitivity and highlights the significance of recognizing and confronting these sensitivities to improve real-world usability and reliability. P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech—stay updated in under 3 mins/day. (arXiv) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    The possibilites of a trader AI with infinite profits
    AI can also be dangerous because it could make automated trades in the stock market or cryptocurrency market, and because it remembers all the exchange rate changes in history and the entire economic history of the world and also has all the statistics and mathematical knowledge at its fingertips, it can easily draw conclusions and create an algorithm that might make you bigger profits than any real human. He could also learn from his own mistakes and keep improving. Is this possible? Are there any AIs like this already? submitted by /u/Steve_Hufnagel [link] [comments]  ( 9 min )
    OpenAI Surges Past $1 Billion in Revenue As Demand For AI Explodes
    OpenAI is reportedly making strides in its financial performance and is on track to make $1 billion in revenue over the next 12 months, as per recent reports by The Information. This is a major milestone, signifying not only the success of OpenAI but also the increasing demand and investment in AI. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/afak3xrevalb1.jpg?width=660&format=pjpg&auto=webp&s=cd8f20ac732618b91dbf96928ede42d693f6c4a9 Why should we pay attention? Setting expectations: The Information estimates OpenAI's monthly revenue to be around $80 million, in line with the $1 billion yearly revenue prediction. Undeniably, OpenAI is accelerating. AI Chatbots are in high demand: ChatGPT, OpenAI's phenomenal co…  ( 10 min )
    Wearable Health (WHSI) Joins AI Research Lab for Wearable Health Data
    Wearable Health Solutions to Advise Next Realm AI on Medical Internet of Things (MIoT) Solutions NEWPORT BEACH, CA / ACCESSWIRE / Wearable Health Solutions Inc. (OTC PINK:WHSI) announced inclusion to Next Realm AI research lab to explore development of healthcare IoT solutions utilizing data analytics and artificial intelligence (AI). Wearable Healthcare Solutions will collaborate and advise Next Realm AI, an artificial intelligence and data analytics research lab located in New York City, on such areas as collecting and developing data solutions within the areas of wearables, IoT, and Medical Internet of Things (MIoT). As an official IBM Business Partner, Next Realm AI assists lab members in integrating leading-edge AI and data solutions into their business operations. By leveraging Next Realm's expertise, clients can modernize processes, boost efficiency, strengthen security, and deliver greater value to customers - all while driving growth and building value. https://www.otcmarkets.com/stock/whsi/news/Wearable-Health-Solutions-to-Advise-Next-Realm-AI-on-Medical-Internet-of-Things-MIoT-Solutions?id=411692 submitted by /u/NextRealm_AI [link] [comments]  ( 9 min )
    What are potential careers to take in the field of artifical intelligence?
    I am 23 year old man, I have a degree in Politics, Philosophy, & Economics. Next year I want to do a masters degree, but I haven't chosen which one yet. I am both fascinated by AI, and want to be future-proof in my education. What potential careers do you see, currently or in the near future, in the field of AI, and what studies would you recommend to be well prepared for them? ​ submitted by /u/ApplePenguinBaguette [link] [comments]  ( 9 min )
    IBM invests in $4.5 billion A.I. unicorn Hugging Face | Fortune
    IBM’s CEO, who froze hiring for thousands of back-office jobs and predicted A.I. would take up to 50% of new jobs, just piled into a $4.5 billion tech unicorn’s massive new $235 million funding round submitted by /u/AminoOxi [link] [comments]  ( 9 min )
    Singularity Day just got closer because of Nvidia?
    New advances in AI hardware are making the singularity more likely. AI systems will be able to learn and process information much faster, which could lead to a breakthrough in AI capabilities. These advancements include quantum computing and neuromorphic computing, but more specifically the rise of affordable models like NVIDIA H100 and, more recently, GH200 models. If you are interested in this kind of information, there are more details here. submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    Where do AI adult websites get their models from?
    Where do websites like made.porn, pornify.cc or porn.ai get their AI models from? submitted by /u/mixedfeelingz [link] [comments]  ( 9 min )
    Looking for a simulated browser
    Like custom world descriptions, AI apps/sites, etc submitted by /u/roblox22g [link] [comments]  ( 9 min )
    AI Robots from Sci-Fi Movies you didn’t know about
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    Tesla is powering up its $300 Million AI Supercomputer Today
    Tesla's making a significant power move today as it prepares to bring its brand-new new AI-cluster online. Rocking a hefty 10,000 Nvidia H100 compute GPUs, the machine will tackle high-performance computing (HPC) workloads and AI applications, placing Tesla's capabilities among the global AI elite. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/c5ykgmr6r6lb1.png?width=970&format=png&auto=webp&s=8b830c8754c1a11792149f57d19a57a77fc8b161 Here’s why this matters: This Nvidia H100-based AI supercomputer will be one of the most powerful globally. With a peak performance of 340 FP64 PFLOPS and 39.58 INT8 ExaFLOPS for AI programs, even Leonardo, currently the fourth highest-performing supercomputer, is surpassed. Tesla’s…  ( 10 min )
    One-Minute Daily AI News 8/29/2023
    Research firm SemiAnalysis has declared that Google’s anticipated Gemini AI model will smash OpenAI’s offering by packing a lot more computing power.[1] DoorDash today announced its development of voice ordering capabilities incorporating AI, building on its existing model leveraging best-in-class agents, to further support restaurant operations.[2] The US Air Force wants $6 billion to build a fleet of AI-controlled drones.[3] Google’s DeepMind says it has cracked a problem that has vexed those trying to verify whether images are real or created by AI. Researchers proclaimed their new watermarking SynthID format can be used to pinpoint AI-generated deepfakes without distorting the image’s original quality.[4] Sources: [1] https://beincrypto.com/ai-wars-google-gemini-chatgpt/ [2] https://about.doordash.com/en-us/news/introducing-ai-and-agent-powered-voice-ordering [3] https://www.engadget.com/the-air-force-wants-6-billion-to-build-a-fleet-of-ai-controlled-drones-204548974.html [4] https://gizmodo.com.au/2023/08/deepmind-says-it-has-a-way-to-identify-ai-images-but-only-on-google/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Microsoft's new AoT aims to create more human-like AI
    Microsoft teamed up with Virginia Tech to publish a white paper introducing their new "Algorithm of Thoughts" (AoT). The objective? To make language learning models akin to human learning. https://preview.redd.it/3q8lcq0k96lb1.png?width=2000&format=png&auto=webp&s=fce7a0e3225c64353ad6c51f65e8b490e795feed If you want to stay on top of the latest trends and insights in AI and tech, look here first. What's the big idea? Microsoft's AoT aims to fuse the accuracy of algorithms with the nuances of human reasoning. A bold aspiration indeed, but not a new one. The goal to empower computers to learn for themselves – akin to human cognition - has been an AI objective since its inception back in the 1950s. The AoT could be seen as an attempt to resolve the drawbacks of the "Chain of Thought" (CoT) approach. LLMs following the CoT approach can provide incorrect steps to the right answer, as they base conclusions on precedent. With AoT, the model works to evaluate the soundness of initial steps or "thoughts," reducing the risk of one incorrect step leading to disproportionate results. What could AoT do? Mitigate AI "hallucinations:" These funny— but disconcerting — instances of AI outputting false information. Enhance the integrity of AI interaction: programmers suggest that improvement in this aspect is crucial for aligning AGI (artificial general intelligence). The takeaway: AI's ability to understand and process information like a human being is a longstanding goal in the field. With AoT, Microsoft seems to be making strides toward achieving it. Much remains to be seen on its efficacy: How it will impact the broader AI ecosystem and the user experiences it can create. P.S. If you like this kind of analysis, I write a free newsletter tracking the most relevant news and research in AI and tech—stay informed in under 3 minutes/day. (source) ​ submitted by /u/AIsupercharged [link] [comments]  ( 10 min )
  • Open

    Designing Deep Networks to Process Other Deep Networks
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [D] Will I get in? [Fall 2024 MS in ML European Universities]
    TLDR: American, Graduated from U of Michigan in 2019 w/ 3.3 GPA Bs in Comp Sci. Worked at Google for 3 years. Samsung Research America for 3 months. No ML specific work experience. No research. Will I get in to European elite ML programs? If not what do I need to do? My GF and I want to study our masters together in Europe. She’s doing business I want to do ML/AI. I spent the past year kind of goofing off. Got kind of burned out and decided I was going to get into music production so spent the pst year mainly doing that with some software mixed in. Recently been self studying ML, both the math from textbooks and trying my hand at some models in python. I do not have any connections to academia currently and wil have to beg a professor who barely knew me from undergrad for a rec. Can get other recs from past bosses. My plan right now is to look for job hopefully in AI but maybe just more general software engineering again, but long term I want to get a masters in person. My current resume looks like: Graduated BS in Comp Sci from univ of Michigan 2019 3.3 GPA Worked at Google for 3 years Worked at Samsung Research America for 4 months Some self study I can claim but not much tangible proof Recommendation from Google Boss (Maybe) recommendation from UofM CS professor that barely knew me My questions to anyone that knows the admissions right now are: 1) Do you think I get to one off this? (To anyone of these schools) 2) If not what are the things to prioritize to improve my chances? What are the timeline of these steps? Can I do them in the next few months or have to wait till next year? submitted by /u/Srokisthename [link] [comments]  ( 10 min )
    [R] DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data
    I just came across this paper, and it just sounds too good to be true. If we regularly spend up to 80% of our time in data preprocessing, this method would suddenly return us A LOT of that time. Has anyone seen it in python code? I haven't found it and I'd love to give it a try with some of my datasets from hell. They do have a GitHub page but I'm too dumb or too noob to make it run in my laptop. submitted by /u/Davidat0r [link] [comments]  ( 9 min )
    [D] Knowledge graph vs text summary+embedding for long term conversational memory
    Hi, I'm relatively new to the space of AI chatbots and I figured I'd get my hands wet with a small personal project. While researching the topic of long term conversational memory I noticed most people are using text embedding in combination with textual summary to generate a conversation history for the AI's prompt. However, this technique seems to have many drawbacks such as loss of details in the summarization process. I was wondering if anyone has experience using knowledge graph DBs like neo4j for conversational memory instead, and what the pros and cons of such an approach are compared to summarization. I'd be greatly interested in any resources that could further my knowledge in this space as my primary goal is to learn from this project. Thanks! submitted by /u/Rainmire [link] [comments]  ( 9 min )
    [Discussion] Does anybody manage to make MuseTree work?
    https://stevenwaterman.uk/musetree/ It's for music generation through musenet. I don't manage to generate anything. It has to be related to API issues or stuff like that? submitted by /u/MusicalSeries [link] [comments]  ( 9 min )
    Shifting order in multiple-choice questions massively affects LLM performance [R]
    Recent research proposes that Large Language Models (LLMs) may not be as reliable as we think. In fact, the order of options in a multiple-choice question drastically influences the responses from LLMs such as GPT-4 and InstructGPT. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/k8yaixbjzalb1.png?width=1289&format=png&auto=webp&s=99ac6280a1e7415f46c0c11938ae20e2b77674b4 What are the findings? LLM sensitivity to multiple-choice arrangement: The study suggests if options in multiple-choice questions are reordered, the LLM's performance varies dramatically— approximately 13% to 75% depending on the benchmark. Positional bias shapes responses: When the LLM is uncertain between top-selected answers, the option positioning can artificially lean its predictions. Observations also found that LLMs favor specific placements when unsure of the optimal response among top-selected answers. Performance improves when calibration techniques are applied: Making use of two unique calibration methods, the performance of LLMS saw up to eight percentage points of increase across numerous models and benchmarks. Why does this matter? This moves us closer to identifying the factors contributing to LLMs' sensitivity and highlights the significance of recognizing and confronting these sensitivities to improve real-world usability and reliability. P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech—stay updated in under 3 mins/day. (arXiv) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [D] Handwritten Text Recognition (OCR) on Historical Documents
    I am working on developing a solution to transcribe historic texts (Pre-1900's) which are all handwritten. I have some data, around 1000's transcribed sentences with their corresponding images of text. TrOCR looked great, but it still makes a lot of mistakes, probably because of the old English phraseology, so I tried to finetune it with my data and see if it improves and that didn't happen. The data I used to train was my 1000 sentences + some public dataset with another 2500 sentences, so just about 3500 sentences in total. Do you think it's because the data is small, that the performance is bad? I'm finetuning "microsoft/trocr-base-stage1" using native PyTorch. If not TrOCR do you recommend any OCR/HTR models I can finetune to my handwritten historical data? I truly appreciate any guidance you send my way. submitted by /u/daxow [link] [comments]  ( 9 min )
    [P] FlowJax - Normalizing flows in JAX
    Hello everyone, Hopefully this is of interest to some of you. For those that don't know, normalising flows can be used as black-box unconditional or conditional distribution approximators, that support both exact sampling and density evaluations. For an excellent review see https://arxiv.org/abs/1912.02762. I am developing flowjax, a Python package for normalising flows, distributions and bijections. It uses Jax for automatic differentiation, and the equinox framework built by Patrick Kidger to allow for a familiar object-oriented design. It includes many powerful flows, e.g. masked autoregressive flows, coupling flows and block neural autoregressive flows. In addition to inheriting some benefits from using JAX (easy GPU support, some efficiency gains), here's a few points where I think flowjax has some advantages over other packages: Comprehensive documentation Simplified definitions of unconditional/conditional bijections and distributions (particularly nicer handling of the conditional case, which some packages seem to stitch in as an afterthought). Easy to plug in different "transformer" bijections to coupling/masked autoregressive flows. Use of efficiency tricks to optimize run times (e.g. circumventing recompilation of identical layers using jax.lax.scan over the flow layers) It has been used in a couple of papers already, but it would be great to have some more people using it and some feedback/suggestions/contributions. There are examples in the documentation for those that are interested. submitted by /u/LimitedConsequence [link] [comments]  ( 9 min )
    [Project] Models for Unsupervised Anomaly Detection of a Single Continuous Feature?
    Of the many unsupervised anomaly detection models out there (iforest, LOF, SVM etc) I am struggling to find a model that makes sense to use to detect anomalies in a single target feature. My current strategy is to subset the data into different categories and run iforest on a single column. I feel as though this method might not be the best because it basically creates a tree with a single branch and measures how many nodes away a given record might be. My confidence scores never seem to exceed around -.17 on a scale of [-1,1] where -1 tends to more confidence in anomalous behavior Is there a better way? Note: Anomalies in my data occur very infrequently submitted by /u/BeefaroniX [link] [comments]  ( 9 min )
    [Discussion] How are you evaluating and monitoring LLMs?
    Question for people who are implementing LLMs (open source, fine tuned, any kind). How do you know that your getting the quality output from the model that you need to ship the feature or model? Are the audits ad hoc data sampling and subjective "good/bad" ratings or have you figured out a more rigorous framework? Is it pretty much ~vibes~ based? What, if any, tools or processes are you putting into place to monitor and observe the LLM when its interacting with real time user data for weeks or months? Most of the folks I have spoken with are doing very ad hoc sampled output and writing down on post its or in a spreadsheet a subjective quality ratings. One person had developed a slightly more rigorous 3 question survey on "is the result factual", "is the result cogent" and "is the result useful". Not everyone is logging their LLM responses they show users which feels very risky to me. Anyone aware of any industry standards being established around this? submitted by /u/Andy-VertaAI [link] [comments]  ( 9 min )
    [D] A blog post on Yet Another ICML Award Fiasco
    I wrote a blog post on the ICML award fiasco: They gave an outstanding paper award to the D-Adaptation paper, that contains worse results that the ones in papers from 9 years ago. Also, this is not the first time that ICML gives awards to questionable or even plainly wrong papers. I believe this might start a serious conversation about "stochastic" awards, and the super noisy reviews in machine learning conferences. https://parameterfree.com/2023/08/30/yet-another-icml-award-fiasco/ submitted by /u/bremen79 [link] [comments]  ( 9 min )
    [D] Is there a monthly limit for OpenAI service in Azure?
    When using OpenAI's api, there's a default limit of $120/month and my company is about to hit it. I plan on requesting an increase of that limit... but wondering, does Azure's OpenAI service have any monthly limit? By looking at their quotas: https://learn.microsoft.com/en-us/azure/ai-services/openai/quotas-limits it doesn't seem like there's a monthly cap. Is this correct? If so, I see no reason why anyone would use OpenAI's api instead of Azure's, as they cost the same but there's no usage limit. Especially if you expect to increase api usage in the future. submitted by /u/alkibijad [link] [comments]  ( 9 min )
    [P] I created GPT Pilot - a research project for a dev tool that uses LLMs to write fully working apps from scratch while the developer oversees the implementation - it creates code and tests step by step as a human would, debugs the code, runs commands, and asks for feedback.
    Github: https://github.com/Pythagora-io/gpt-pilot Detailed breakdown: https://blog.pythagora.ai/2023/08/23/430/ For a couple of months, I've been thinking about how can GPT be utilized to generate fully working apps, and I still haven't seen any project that I think has a good approach. I just don't think that Smol developer or GPT engineer can create a fully working production-ready app from scratch without a developer being involved and without any debugging process. So, I came up with an idea that I've outlined thoroughly in the blog post above, but basically, I have 3 main "pillars" that I think a dev tool that generates apps needs to have: Developer needs to be involved in the process of app creation - I think that we are still far away from an LLM that can just be hooked up to …  ( 11 min )
    [D] Graph Signal Processing Applications and Training
    I'm studying GSP and I'm stuck on the definition of the Graph Fourier Transform. The sigma notation and signal makes sense, but why is there an \"i\" term at the eigenvector mu? Shouldn't the eigenvector not depend on the \"i\"? And if it does, what does the \"i\" imply? submitted by /u/Ihaveaparrot [link] [comments]  ( 9 min )
    [D] - Given that we can lossily transform text to images and vice versa, multimodality should not be required for AGI or the construction of world-models. Any causal relationship that can be inferred from images/audio/video should be inferable from text.
    Consider video data that captures various interactions between entities—let's say Person A and Person B. We then apply a video summarization network T(x), where x is some video or an entity in the video, onto the video. For sake of argument, let's assume T(x) provides a description of x so detailed that we can decode the description back into the original video without losing much information via some arbitrary text-video model. Now, if we can infer a causal relationship in the video—like Person A punching Person B—then logically, an isomorphic relationship should also be inferable from the text encodings T(A) and T(B) (unless that relationship is one of the small pieces of information lost during the lossy transformation). After all, the encoding is just another representation of the same…  ( 11 min )
    [D] HPC from local servers for deep learning as well as simpler tasks
    Hi all, the company I am working at has several servers used for different tasks including data analysis and machine learning, including smaller tasks as well as deep learning. What are some ways/ technologies they could create a distributed system where users can submit their jobs and they are dispatched automatically? I was thinking of having an entry node that is the only one faced by users, is where all conda environments are and jobs can be submitted from there. Please let me know if you have any suggestions/ tools that you know that would make sense. Thanks in advance! submitted by /u/returnname35 [link] [comments]  ( 9 min )
    [D] HPC from local servers for deep learning as well as simpler tasks
    Hi all, the company I am working at has several servers used for different tasks including data analysis and machine learning, including smaller tasks as well as deep learning. What are some ways/ technologies they could create a distributed system where users can submit their jobs and they are dispatched automatically? I was thinking of having an entry node that is the only one faced by users, is where all conda environments are and jobs can be submitted from there. Please let me know if you have any suggestions/ tools that you know that would make sense. Thanks in advance! submitted by /u/returnname35 [link] [comments]  ( 9 min )
    [P] Self-Hosting a 16B LLAMA 2 Model in the Banking Sector: What Could Go Wrong?
    I've received a freelance job offer from a company in the banking sector that wants to host their own LLAMA 2 model in-house. I'm hesitating to accept the gig. While I'll have access to the hardware (I've estimated that an A100 80GB will be required to host the 16B parameter version and process some fine-tuning & RAG), I'm not familiar with the challenges of self-hosting a model of this scale. I've always relied on managed services like Hugging Face or Replicate for model hosting. For those of you who have experience in self-hosting such large models, what do you think will be the main challenges of this mission if I decide to take it on? ​ Edit: Some additional context information Size of the company: Very small ~ 60 employees Purpose: This service will be combined with a vector store to search content such as Word, Excel and PowerPoint files stored on their servers. I'll implement the RAG pattern and do some prompt engineering with it. They also want me to use it for searching things on specific websites and APIs, such as stock exchanges, so I (probably) need to fine-tune the model based on the search results and the tasks I want the model to do after retrieving the data. submitted by /u/IMissEloquent75 [link] [comments]  ( 9 min )
    [D] Is there anything LangChain can do better than using LLMs directly (either through a website or an API), any examples? Why would someone choose to use it?
    I haven't used ChatGPT a lot or any other LLMs, I've been reading about Langchain and its use cases, and I'm having trouble wrapping my head around exactly what it does. From what I understand, its an alternative interface for LLMs, allowing for easy switching between them, and makes some work for specific use cases easier. If I wanted to write an app or script to interact with LLMs and do other tasks, how would LangChain be better than just making API call(s) to an LLM, getting back the result as a string, and doing whatever with it? submitted by /u/TheTwelveYearOld [link] [comments]  ( 9 min )
    [D] Using LLMs in Production - Model Fallbacks Tutorial + Caching
    Hello r/MachineLearning I'm one of the maintainers of https://github.com/BerriAI/litellm/ - open-source library to call all LLM APIs using the OpenAI format [Anthropic, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.]. I'm writing this post to share some of the strategies we use for using LLMs in production, we've served over 2M+ queries so far TLDR: Use Caching + Model Fallbacks for reliability. This post goes into detail of our fallbacks implementation Using LLMs reliably in production involves the following components: Caching - Cache Embedding() and Completion() for all models Model Fallbacks - set fallback_models=['gpt-3.5-turbo', 'command-nightly', 'llama2]. If primary model fails try fallback models. This deals with rate-limiting errors and when Provider APIs go down …  ( 10 min )
    [D] Decision Transformer Alignment should be better than DeepMind ReST
    We've done some experiments recently, see the tech report: https://arxiv.org/abs/2308.12050v1 We train an SFT model and an RM model, then align the LLM with DT/MLE with filtering (ReST) + RM /SFT datasets/SFT model-generated samples https://preview.redd.it/195op5q636lb1.png?width=1081&format=png&auto=webp&s=a9fa862e8a9ab05819484af8619f73d918fdc26a DT is the Decision Transformer alignment MLE is the ReST-like alignment https://preview.redd.it/u6x28fook5lb1.png?width=1118&format=png&auto=webp&s=4a87898129c1238c00071d43809f5daf440b26d8 submitted by /u/seventh_day123 [link] [comments]  ( 9 min )
  • Open

    Could anyone help me why the following list is the optimal policy for this environment? (Reference: Sudharsan's Deep RL book)
    ​ https://preview.redd.it/qderz9bsoblb1.png?width=1195&format=png&auto=webp&s=fb8ec749d0ce5000e66951b173228278a1d4c3a3 submitted by /u/nimageran [link] [comments]  ( 9 min )
    Could anyone help me why the following list is the optimal policy for this environment? (Reference: Sudharsan's Deep RL book)
    ​ https://preview.redd.it/qderz9bsoblb1.png?width=1195&format=png&auto=webp&s=fb8ec749d0ce5000e66951b173228278a1d4c3a3 submitted by /u/nimageran [link] [comments]  ( 9 min )
    Help With RLLib/ Alternatives
    RLLib is currently stealing my remaining sanity, so I'm making a desperate scream into the void. I can't get my troubleshooting right. I built a nice, custom Gym env that I've been running with SB3 and I feel like I'm caught in an endless array of errors, currently: ValueError: The two structures don't have the same nested structure. I can't help but feel like I'm going about this wrong and missing important information on how to do this correctly. The RayLlib Forum hasn't really been filled with people, so I'm asking: Does anyone know of a Debugging Manual/ A Discord Server/ A Migration Guide? submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
    Recommendations for RL Library for 'unvectored' environments
    Hi, I'm working on a problem which has a custom gym environment which I've made, and as it interacts with multiple other modules which have their own quirks, I need to use a reinforcement learning library which works in a specific way that I've only seen PFRL use. The training loop needs to be in this format: 'obs, reward, done = agent.step(action)', 'agent.observe(obs, reward, ... )' rather than what I see in most modern RL libraries where you define an agent and then run a '.train()' method. Are there any libraries which work in this way? I'd love to use something like StableBaselines but they don't seem to play nice and I'd rather not rewrite the gym environment if I can avoid it. Thanks submitted by /u/return_reza [link] [comments]  ( 9 min )
    MDPs: gentle tutorial ...
    Markov Decision Processes (MDPs) form the cornerstone of reinforcement learning (RL) and serve as a fundamental modeling tool for making sequential decisions. In this note, we present a comprehensive definition of MDPs and provide a detailed derivation of the Bellman equations, along with the optimality results. Our approach aims to ensure a thorough understanding by avoiding the omission of any steps in the mathematical proofs. The primary goal is to facilitate reading classic textbooks on (approximate) dynamic programming, optimal control, and reinforcement learning, where proofs and derivations can sometimes obscure crucial details, making them less accessible to readers from diverse scientific and engineering backgrounds. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4535241 submitted by /u/omroot [link] [comments]  ( 9 min )
    Do you know Poker Ai gyms for adversarial policy trainings?
    I want to try use adversarial policies (https://arxiv.org/abs/1905.10615) against poker no limit holdem 6-9 players RL models. I was looking for open-ai gym like environement for that project. Im looking for: - access to game state from each player persperctive (to create input for adversarial model) - support for custom bets (not limited to 0, 1/2pot, all-in) - build-in RL models / support for opensoure RL models - option to add custom model as player So far I found those and read readme files: https://github.com/dickreuter/neuron_poker https://github.com/fschlatt/clubs_gym https://rlcard.org/ https://www.deepmind.com/open-source/openspiel Did anybody work on similar project? Which gym did you use, and what experience do you have with it? Since, adversarial policies tend to work better for high-dimensionality I would prefer to 6players variant. I know that modern poker ai approach are not based on pure RL, but I want to check how vulnerable are classic RL poker models. submitted by /u/MrCogito_hs [link] [comments]  ( 9 min )
    Twitter / Machine Learning Community
    submitted by /u/x9182 [link] [comments]  ( 9 min )
    Reinforcement learning environment for cyber security automation
    submitted by /u/limmen [link] [comments]  ( 9 min )
    How do I teach my PPO agent to play Breakout?
    I have coupled my agent with EnvPool in order to speed up the learning process. It seems to be playing Pong in less than an hour. However, when I try to make it Breakout, even after many hours it still struggles. Also, it it seems like the network is facing catastrophic forgetting as after a few hours it's performance suddenly deteriorates. Any ideas to fix this? I tried incorporating major ideas for PPO from here. Here's my code. Feel free to let me know if you have any questions. Since I have incorporated EnvPool, the code won't run in Windows anymore. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
  • Open

    Deploy self-service question answering with the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra and large language models
    Powered by Amazon Lex, the QnABot on AWS solution is an open-source, multi-channel, multi-language conversational chatbot. QnABot allows you to quickly deploy self-service conversational AI into your contact center, websites, and social media channels, reducing costs, shortening hold times, and improving customer experience and brand sentiment. In this post, we introduce the new Generative AI features for QnABot and walk through a tutorial to create, deploy, and customize QnABot to use these features. We also discuss some relevant use cases.  ( 13 min )
    Automatically generate impressions from findings in radiology reports using generative AI on AWS
    This post demonstrates a strategy for fine-tuning publicly available LLMs for the task of radiology report summarization using AWS services. LLMs have demonstrated remarkable capabilities in natural language understanding and generation, serving as foundation models that can be adapted to various domains and tasks. There are significant benefits to using a pre-trained model. It reduces computation costs, reduces carbon footprints, and allows you to use state-of-the-art models without having to train one from scratch.  ( 13 min )
  • Open

    Modeling and improving text stability in live captions
    Posted by Vikas Bahirwani, Research Scientist, and Susan Xu, Software Engineer, Google Augmented Reality Automatic speech recognition (ASR) technology has made conversations more accessible with live captions in remote conferencing software, mobile applications, and head-worn displays. However, to maintain real-time responsiveness, live caption systems often display interim predictions that are updated as new utterances are received. This can cause text instability (a “flicker” where previously displayed text is updated, shown in the captions on the left in the video below), which can impair users' reading experience due to distraction, fatigue, and difficulty following the conversation. In “Modeling and Improving Text Stability in Live Captions”, presented at ACM CHI 2023, we f…  ( 93 min )
  • Open

    Autonomous innovations in an uncertain world
    Jonathan How and his team at the Aerospace Controls Laboratory develop planning algorithms that allow autonomous vehicles to navigate dynamic environments without colliding.  ( 9 min )
  • Open

    Building a “heavy metal quartet” of AI compilers
    A new quartet of AI compilers: Rammer, Roller, Welder, and Grinder, tackle a range of compiler optimization challenges based on the same tile abstraction, providing a comprehensive solution to connect AI models with hardware accelerators. The post Building a “heavy metal quartet” of AI compilers appeared first on Microsoft Research.
    Research Focus: Week of August 28, 2023
    In this issue: An illusion of predictability in scientific results; Kathleen Sullivan named to Insider’s 30 under 40 in healthcare list; FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations. The post Research Focus: Week of August 28, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    AI Lands at Bengaluru Airport With IoT Company’s Intelligent Video Analytics Platform
    Each year, nearly 32 million people travel through the Bengaluru Airport, or BLR, one of the busiest airports in the world’s most populous nation. To provide such multitudes with a safer, quicker experience, the airport in the city formerly known as Bangalore is tapping vision AI technologies powered by Industry.AI. A member of the NVIDIA Read article >  ( 6 min )
    Deepdub’s AI Redefines Dubbing From Hollywood to Bollywood
    In the global entertainment landscape, TV show and film production stretches far beyond Hollywood or Bollywood — it’s a worldwide phenomenon. However, while streaming platforms have broadened the reach of content, dubbing and translation technology still has plenty of room for growth. Deepdub acts as a digital bridge, providing access to content by using generative Read article >  ( 5 min )
  • Open

    First time seeing a rare event
    Suppose you’ve been monitoring a rare event for a long time, then you see your first occurrence on the Nth observation. Now what would you say about the event’s probability? For example, suppose you’re wondering whether dogs ever have two tails. You observe thousands of dogs and never see two tails. But then you see […] First time seeing a rare event first appeared on John D. Cook.  ( 5 min )

  • Open

    AI powered personal assistant in private Beta
    submitted by /u/anehzat [link] [comments]  ( 9 min )
    Stanford's DSPy Framework Revolutionizes AI Language Processing Tasks
    Stanford researchers have unveiled a groundbreaking artificial intelligence (AI) framework known as DSPy. Designed to utilize Language Models (LMs) and Retrieval Models (RMs) optimally, DSPy is set to make AI programming more powerful, intuitive, and efficient. Why does this matter? DSPy was built with complex tasks in mind. LMs, like GPT-3, generate Human-like text from given inputs, while RMs retrieve relevant data. DSPy combines their capabilities, enabling tasks like summarizing information from databases. It works on Pythonic syntax, using declarative and composable modules to instruct LMs. DSPy's automatic compiler finetunes the LM to run any program's steps. it replaces manual intermediate-stage labeling and string manipulation with systematic modular pieces. What's unique about DSPy? It introduces "Signatures" and "Teleprompters" that compile your program. A 'signature' explains the task and inputs for the LM, while Teleprompters improve the effectiveness of prompts. Compared to other libraries, DSPy requires minimal labeling and bootstraps any needed intermediate labels. In short, DSPy simplifies delivering more nuanced instructions to AI and retrieving more detailed and accurate responses, thus widening the spectrum of tasks AIs can accomplish. P.S. (small self-plug) If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech---stay updated in under 3 mins/day. (github) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Google's DeepMind Unveils Invisible Watermark to Spot AI-Generated Images
    As AI image generators increase in popularity, differentiating between authentic and AI-created images is becoming more complex. DeepMind, Google's AI unit, is addressing this by developing an imperceptible watermark known as SynthID for its AI-generated images to counter misinformation. https://i.redd.it/y370eu1tt4lb1.gif Why this matters: DeepMind's SynthID tags AI-generated images: Invisible to people but detectable by computers, this watermark hopes to aid in the verification of images. Technology, however, isn't completely foolproof: DeepMind itself acknowledges that intense image manipulation could compromise the watermark. Google's image generator, Imagen, will only apply to images created using this tool: Google aims to instantly identify AI-generated images with this effectively hidden watermark. DeepMind's head of research, Pushmeet Kohli, shared the following details: The watermark changes on images are so subtle that humans wouldn't notice, yet DeepMind can still detect an AI-generated image. Despite any subsequent cropping or editing, the watermark remains identifiable by DeepMind's software. Colors, contrast, or size changes won't affect it. Calls for a standard approach to AI-generated image identification continue: More coordination between businesses is crucial, different methods adopted by various firms add degrees of complexity in tagging AI content. Other tech giants, including Microsoft and Amazon, pledge to watermark some AI content, meeting similar demands for transparency over AI-generated works. P.S. If you like this kind of analysis, I write a free newsletter that keeps you informed of all you need to know about AI developments in under 3 mins/day. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    New iterative, self-revising language model, SelFee, beating the rest with self-feedback generation
    Introducing SelFee—a reinvented and powerful language model that uses self-feedback and self-revision to generate high-quality responses backed by a team of researchers from KAIST. Unlike previous models, SelFee doesn't rely on external, large-scale language or task-specific models, tipping the scales in the AI world. If you want to stay ahead of the curve in AI and tech, look here first. https://i.redd.it/bgszhpai43lb1.gif Why it matters? SelFee, built on the base of LLaMA-based instruction-following model and fine-tuned, offers a fresh approach - generating an initial solution and self-feedback sequences and then revising its answers until a high-quality response is achieved. Data used for its training and model evaluation was collected from varied sources and fine-tuned with OpenAI API calls, beating the 13B SelFee model with a minimal 7B SelFee model that generated at least three revisions. SelFee proves the potential of iterative revision in enhancing language model responses, indicating that an increase in inference computation of a model may be superior to merely magnifying its size. Features and Limitations: SelFee's effective use of self-feedback significantly improves response quality, avoiding the requirement of external, large-scale language or task-specific models, translating into faster, cost-effective LLM solutions. However, lacking in certain areas compared to ChatGPT, such as math, reasoning, factuality, and coding, SelFee has room for further improvement and growth. The revolution in the AI language model landscape is promising but still an evolving journey, with SelFee being the latest participant driving this change. P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech—stay updated in under 3 minutes/day. (source) (github) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Camouflage AI
    I thought about an AI, that would intake a couple photos of terrain, analyze the color palette, patterns etc and with that information, would choose like 3 existing camouflage patterns that would blend in the best in the terrain where the photos were taken. Does something like that exist? I know that US army has an AI that creates camo with the use of thousands of photos, and that's how MARPAT and Multicam were made, but I'm interested in an AI that would choose from already existing patterns. Does something like this exist? What do you think of this idea? submitted by /u/BrytolGasMasks [link] [comments]  ( 9 min )
    Is ChatGPT Plus worth it? Or should I stay with the free version and use Bard for stuff that requires web access?
    I'm mainly using it for educational purposes. Thank you. Edit: I'm in the Psych field. I use it to make presentations, summaries, ideas based on references like books, websites, journals. submitted by /u/East_Professional385 [link] [comments]  ( 9 min )
    How far off are we from free AI video makers
    So right now as far as I can tell all the AI video makers are things like a few second clip, stable diffusion changing images with other images, or stock images. Oh and that thing that was on Twitch for a short bit. When are we going to get an actual worth while AI video maker? submitted by /u/crua9 [link] [comments]  ( 9 min )
    The Architecture of Thought: Reflective Structures in Mental Constructs
    submitted by /u/alcanthro [link] [comments]  ( 9 min )
    25 Best Movies exploring concept of Artificial intelligence (1968 -2023 ) I bet you haven’t watched all
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    ChatGPT usage remains low, Pew Research suggests, as concerns about AI continue to rise
    The usage and fear surrounding ChatGPT aren't as prevalent as you might think, according to a recent poll from Pew Research. Only 18% of Americans have reportedly used ChatGPT. The demographic that uses it the most? Men aged 18-29 that are college-educated, but even that's just a 30-40% usage rate. https://preview.redd.it/0ax222gxczkb1.jpg?width=620&format=pjpg&auto=webp&s=d3b04169d5de1985d1c52dce7962b5f3a543b014 Why does this matter? ChatGPT has still managed to gain a remarkable level of popularity, despite low usage. This suggests that even though not many people are using it, they are aware of it and its potential capabilities. More people reported using ChatGPT for entertainment or to educate themselves rather than for work. People anticipate AI to have a greater impact on jobs such as software engineers, graphic designers, and journalists. But the expectation is that AI as a whole, not just ChatGPT, will be the driving force behind this. Concern about AI is increasing, not decreasing. 47% of respondents said AI makes them more worried than excited, compared to 31% last year. This concern seems to rise with the level of AI knowledge one possesses. Industries unshaken by AI: As per the survey, employed individuals who are aware of ChatGPT don't see it drastically affecting their jobs. The sectors like hospitality, entertainment, construction, and manufacturing feel the least threatened. Stay updated about AI and its influence on different verticals! Don't miss out on the latest insights, developments, and trajectories of AI. Our free newsletter is all you need to be au fait with the AI world. Keep yourself informed in under 3 minutes/day. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/28/2023
    Chinese e-commerce giant Alibaba has added two new generative AI large language models designed to interpret images to its open-source stable.[1] Several top news publications like The New York Times, CNN and the Australian Broadcasting Corporation (ABC) have blocked Microsoft-backed OpenAI to access their content to train its AI models.[2] Intel on Monday said a new data center chip coming out next year will handle more than double the amount of computing work that can be done for each watt of power used, part of a broader industry push to lower electricity consumption.[3] OpenAI unveiled the new service, dubbed “ChatGPT Enterprise,” in a company blog post and said it will be available to business clients for purchase as of Monday. The new offering promises to provide “enterprise-grade security and privacy” combined with “the most powerful version of ChatGPT yet” for businesses looking to jump on the generative AI bandwagon.[4] Sources: [1] https://voicebot.ai/2023/08/28/alibaba-adds-visual-understanding-to-open-source-generative-ai-large-language-models/ [2] https://www.news18.com/tech/several-top-news-publications-block-openai-from-accessing-their-content-8551840.html [3] https://www.reuters.com/technology/intel-says-new-sierra-forest-chip-more-than-double-power-efficiency-2023-08-28/ [4] https://www.cnn.com/2023/08/28/tech/chatgpt-enterprise-openai/index.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Prompt engineering for GPT4
    My page on PromptBase: https://promptbase.com/profile/singularity99 submitted by /u/No-Transition3372 [link] [comments]  ( 9 min )
    Has AI-By-Learning really been proven impossible?
    I'm curious what people specifically in the artificial intelligence think about the recent work by Iris van Rooij et al. earlier this month. They seem to have proven that current approaches to reaching AGI, like LLMs, are incapable of achieving it. I'm not convinced. I quickly wrote up a full rebuttal piece explaining how not convinced I was. What about everyone else? submitted by /u/alcanthro [link] [comments]  ( 9 min )
  • Open

    [D] Why don't we build models that design/build better models. Too computationally expensive?
    At what point do we create a model to build/design better models? Models = ml architecture submitted by /u/Significant_Water_28 [link] [comments]  ( 9 min )
    [D] Question: What's the future of image-analytics models?
    Hey everyone, first post on this sub so sorry if there's anything wrong. Right now, what are the cutting edge image processing models? This is in the context of the segmentation of specific features from an image (ie. finding the cars in an image of a busy roadway). The reason I am asking is I want to learn more image processing architectures that way I can find better direction for specific research areas to look into. Thanks in advance! :) submitted by /u/Adventurous-Tower392 [link] [comments]  ( 9 min )
    Stanford's DSPy Framework Revolutionizes AI Language Processing Tasks [R]
    Stanford researchers have unveiled a groundbreaking artificial intelligence (AI) framework known as DSPy. Designed to utilize Language Models (LMs) and Retrieval Models (RMs) optimally, DSPy is set to make AI programming more powerful, intuitive, and efficient. Why does this matter? DSPy was built with complex tasks in mind. LMs, like GPT-3, generate Human-like text from given inputs, while RMs retrieve relevant data. DSPy combines their capabilities, enabling tasks like summarizing information from databases. It works on Pythonic syntax, using declarative and composable modules to instruct LMs. DSPy's automatic compiler finetunes the LM to run any program's steps. it replaces manual intermediate-stage labeling and string manipulation with systematic modular pieces. What's unique about DSPy? It introduces "Signatures" and "Teleprompters" that compile your program. A 'signature' explains the task and inputs for the LM, while Teleprompters improve the effectiveness of prompts. Compared to other libraries, DSPy requires minimal labeling and bootstraps any needed intermediate labels. In short, DSPy simplifies delivering more nuanced instructions to AI and retrieving more detailed and accurate responses, thus widening the spectrum of tasks AIs can accomplish. P.S. (small self-plug) If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech---stay updated in under 3 mins/day. (github) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [P] Finetuning an LLM to imitate someone
    Hello all, I'm trying to understand how to get an LLM to imitate someone, say Shakespeare. It's easy enough to get all of Shakespeare's work. If I've understood the current state of play for LLMs, there are three options: Fine tune an LLM Vectorize your knowledge using something like ChromaDB. Do a similarity search after each prompt and get the LLM to "read" the top n docs Do both I have a feeling that to imitate Shakespeare, fine tuning an LLM might work best. However, if my understanding is correct, the inputs to finetune an LLM must be formatted this way: : "To be" : "Or not to be" The gap I'm having trouble bridging is how do I go from a large text file to this input format? The only idea I've come across is format all of the text like so: : "sentence_1" : "sentence_2" : "sentence_2" : "sentence_3" Are there best practices around this problem? How should I be thinking about this? I've seen companies like character.ai create bots that imitate Elon Musk accurately for example so I know it's doable. I just wonder if they've done it by finetuning an LLM or training one from scratch or something else entirely. submitted by /u/Vanishing-Rabbit [link] [comments]  ( 9 min )
    [N] Google's DeepMind Unveils Invisible Watermark to Spot AI-Generated Images
    As AI image generators increase in popularity, differentiating between authentic and AI-created images is becoming more complex. DeepMind, Google's AI unit, is addressing this by developing an imperceptible watermark known as SynthID for its AI-generated images to counter misinformation. https://i.redd.it/z0fj6f3yt4lb1.gif Why this matters: DeepMind's SynthID tags AI-generated images: Invisible to people but detectable by computers, this watermark hopes to aid in the verification of images. Technology, however, isn't completely foolproof: DeepMind itself acknowledges that intense image manipulation could compromise the watermark. Google's image generator, Imagen, will only apply to images created using this tool: Google aims to instantly identify AI-generated images with this effectively hidden watermark. DeepMind's head of research, Pushmeet Kohli, shared the following details: The watermark changes on images are so subtle that humans wouldn't notice, yet DeepMind can still detect an AI-generated image. Despite any subsequent cropping or editing, the watermark remains identifiable by DeepMind's software. Colors, contrast, or size changes won't affect it. Calls for a standard approach to AI-generated image identification continue: More coordination between businesses is crucial, different methods adopted by various firms add degrees of complexity in tagging AI content. Other tech giants, including Microsoft and Amazon, pledge to watermark some AI content, meeting similar demands for transparency over AI-generated works. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [D] Questions on artificial neural networks from a neuroscientist
    Hello everyone. I'm yet another person looking to expand my understanding of artificial intelligence, and I'm trying to get a map of all the language that is used to describe and understand artificial neural networks. My training is in neuroscience, so all my language is focussed on how real neurons are created, interact, form networks, and how those networks interact to take in multisensory observation and output some of the vast variety of things our brains can do. Which leaves me with a lot of questions in my jargon that I cannot currently map onto the jargon of ML/AI, and I'm hoping that participating in this community can help with that, over time. I am already keenly aware that the phrase "artificial neural networks" is very gauzy. There is some biomimicry in their design and arch…  ( 10 min )
    [P] Codellama inference code complete
    Quite recently, I jumped on the boat of trying out llama. I noticed codellama did not provide any inference code. Yes, it provided python files which lets you run the inference but not a programming method rather terminal approach. Terminal approach is great as it allows experts to run and perform inference+evaluation easily. But, if you are just starting out/new/non-seasoned programmer/individual in AI, it is frustrating. Because one, you can't play with actual code, limiting learning opportunities and two, it does not produces the curiosity in most cases to read all the code. On top of that, I realised there was a lack of repositories and articles on this subject to load code-llama even with third-party methods. Which is why, I wrote two notebooks which outlines the process of how you can load code-llama from FAIR repository using code. [Believe me it's fun and filled with learning opportunities] and two how you can use Huggingface to load the model and perform inference. Few points: 1. Performing inference from FAIR repo, requires significant amount of computing resources even for 7B model. 2. Huggingface method can be loaded using free Google Colab subscription. [Feel free to star, if it helped you] GitHub Link: https://github.com/sleepingcat4/codellama-inference submitted by /u/Suspicious-Bird8840 [link] [comments]  ( 9 min )
    Python/Java Developers Interested in Side Projects Outside of Work (FX-Algo) "[Research]""[Discussion]"
    Throw away account for the obvious reason... This is not a job posting or self-promotion. We are networking in an attempt to speak to like-minded people who might be interested in a little side project outside of work. We are keen to speak to a London/UK-based Developer with a Banking sector background to join us on a project outside of work with the vision of potentially growing a fund. In short; we are in the process of developing an FX Macroeconomic Sentiment Divergence Trading Algorithm. There are currently 4 participants in the project (2 Developers and 2 Traders), 3 of whom work for Tier1 IBs in market-facing roles. 1 of the Developers is likely going to leave the project and we are interested in speaking to someone about picking up his part of the project. There are 3 parts to the project. The first part is mostly complete, now leaving the other 2 parts for us to start working on. We have manually backtested the strategy and it proves to be very profitable - more details can be shared about the strategy and results upon engagement. We are all VP-level in our roles and have around 10-15 years of experience in our requisite field. The tech stack for the project is Python, Java, Kafka, MongoDB and Springboot. We are also very interested in integrating some AI/ML modeling, so if you have any experience in this field that would be a big advantage A Banking background and being UK-based is a non-negotiable. If you feel like this could apply to you, get in touch! :) submitted by /u/BuyTheDipSellTheRipp [link] [comments]  ( 9 min )
    [D] Best GPU cloud hosting for a side project that’s easy to scale?
    Context: I have an app that needs GPUs for DL inference (I don’t need GPUs for training, I own a 3070 TI). My DL model inference is pretty slow (the model framework I'm using is known to be slow) so either one machine with multiple beefy GPUs or multiple GPUs on separate machines will be necessary. My machines will be running custom docker containers. Slow inference: I was planning on putting a few GPU instances behind nginx load balancer and running pytriton on the instances. Since inference is pretty slow, I’m worried if multiple people send requests to a server at the same time, there will be significant delays on responses. Has anyone ran into this before and have insight on streamlining slow inference/scaling demand? "Community" Cloud GPUs: I did a lot of research into clo…  ( 10 min )
    [R] Loss of Plasticity in Deep Continual Learning - University of Alberta 2023 - Continual backpropagation maintains plasticity indefinitely!
    Paper: https://arxiv.org/abs/2306.13812 Github: https://github.com/shibhansh/loss-of-plasticity Abstract: Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples. More fundamental, but less well known, is that they may also lose their ability to learn on new examples, a phenomenon called loss of plasticity. We provide direct demonstrations of loss of plasticity using the MNIST and ImageNet datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89\% accuracy on an early task down to 77\%, about the level of a linear network, on the 2000th task. Loss of plasticity occurred with a wide range of deep network architectures, optimizers, activation functions, batch normalization, dropout, but was substantially eased by L2-regularization, particularly when combined with weight perturbation. Further, we introduce a new algorithm -- continual backpropagation -- which slightly modifies conventional backpropagation to reinitialize a small fraction of less-used units after each example and appears to maintain plasticity indefinitely. https://preview.redd.it/ewl0336sd3lb1.jpg?width=801&format=pjpg&auto=webp&s=e105e6fa86daad84cdc847e96fec3cac5a237c77 https://preview.redd.it/vdd3i46sd3lb1.jpg?width=1159&format=pjpg&auto=webp&s=47dfef94870c94246cb272b7f8299e1033f40873 https://preview.redd.it/zc4tc16sd3lb1.jpg?width=1389&format=pjpg&auto=webp&s=e2f3b064268d475805c153457c7a60b4a1d42b74 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Interesting master thesis topics in AI and NLP [P]
    Hi! I am going to write my master thesis within the fields of AI and NLP this year. But I am struggling with finding a topic that interests me. Does anyone here have some good suggestions? I am not that good in deep learning theory, so I am looking for a more applied topics, such as classification or text generation problems. submitted by /u/IndependentSidekick [link] [comments]  ( 9 min )
    [Discussion] Promising alternatives to the standard transformer?
    What are some promising transformer alternatives/variants that you think more folks should be aware of? They need not be new or SOTA! My list so far includes RWKV: https://arxiv.org/abs/2305.13048 (state space) S4, H3, Hyena: https://github.com/HazyResearch/safari (MLP-based) Hypermixer, MLP-mixer: https://arxiv.org/abs/2203.03691 Retnet https://arxiv.org/abs/2307.08621 (random feature-based attention) EVA, LARA https://arxiv.org/abs/2302.04542 (rotary embeddings) RoFormer https://arxiv.org/abs/2104.09864 dynamic convolutions https://arxiv.org/abs/1901.10430v2 My hope is to assemble a list of 10-15 diverse architectures that I can study in depth by comparing and contrasting their designs. Would love to share my findings with this community. submitted by /u/alpthn [link] [comments]  ( 9 min )
    [D] CLIP open vision-language model alternative
    I'm experimenting with CLIP to use it for a downstream task RL which requires good image semantics understanding, but I'm quite disappointed with its performance. I need better contrastive performance in the representations. Any suggestions? ​ omg no way submitted by /u/rima-m [link] [comments]  ( 9 min )
    [P] Build adaptive sparse grids to accurately approximate and integrate functions of multiple variables
    I'm working on a project that provides an adaptive sparse grid algorithm on Chebyshev nodes for interpolation and integration of multivariable functions on k-cells. https://github.com/rnburn/bbai Unlike polynomial interpolants in equispaced points, interpolants in Chebyshev nodes have excellent approximation properties (see Myth 1 of [1]). If a function is Lipchitz continuous, they converge; if a function is smooth with v derivatives and bounded variation for the v-th derivative, then they converge O(n^-v); and if a function is analytic, they converge geometrically. The Chebyshev Gauss-Lobatto nodes define a sequence of nested points, X^1, X^2, ..., that make it possible to build Smolyak sparse grids at Chebyshev nodes ([2], [3]). ![img](4cw0xi8oc2lb1 " ") For bbai, I implemented the …  ( 12 min )
    [D] Optimizing Keyword Search: Balancing SQL Script Enhancements and AI Solutions
    I'm currently thinking about how to implement the "similar keywords" feature. I've prepared a table with keywords that are extracted from several hundred other tables. It includes basic information such as "keyword," "type," "words" (indicating the number of words in a keyword, e.g., "first name" will have "words" = 2), as well as some technical fields (such as database, table, etc.). In our data product, after entering a specific keyword, we have various pieces of information (which I'm not currently focusing on), and among them, we have "SIMILAR KEYWORDS." The results are displayed based on simple SQL queries, for instance: ​ SELECT word, SUM(CASE WHEN type IN ('N', 'T') THEN 1 ELSE 0 END) AS count, COUNT(\*) \* CASE WHEN (word + '%') LIKE u/word \+ '%' THEN 1.5 ELSE 1 END AS score FROM object_keywords WHERE ('% ' + word + '%') LIKE '%' + u/word + '%' AND (database_id = u/database_id OR u/database_id IS NULL) AND ( .... more technical information here. ​ I'm wondering how to improve this process. Would it be worth considering some AI solutions, or should I focus on enhancing the current SQL scripts (e.g., think about a more advanced scoring system)? What are your thoughts on this? Has anyone worked on something similar? submitted by /u/International-Shirt5 [link] [comments]  ( 9 min )
    [D] Is there already a way to use Llama 2 with a very big system prompt?
    I've seen something like that: https://together.ai/blog/llama-2-7b-32k Is there a way to use llama 2 13b chat or 70b chat with 32k prompt? If not what are the alternatives? Would that: https://youtu.be/ypzmPwLH_Q4?feature=shared be the best thing to do? I'm trying to create a chat bot that would have a pretty specific exeprtise. For example: I would like to feed in soccer rules and then make the bot answear questions about soccer. The system prompt is amazing, but is very limited. submitted by /u/Botanical0149 [link] [comments]  ( 9 min )
    [D] Trying to understand Concept learning | Some questions based on Tom Mitchell Chapter 2
    Hi, Im going through Tom Mitchell's Machine Learning and have a couple of questions based on the 2nd chapter : Concept learning. I was hoping I could get some external point of view on these: ​ Pg 44, para 2, part 1 : "advantage of viewing inductive inference systems in terms of their inductive bias is that it provides a nonprocedural means of characterizing their policy for generalizing" Are there any general procedures to identify and validate the inductive bias of a system? Are there any guidelines to ensure the inferred definition of inductive bias is without errors? Assuming all/most predictive algorithms can be defined in terms of their inductive bias, while concentrating on choosing the algorithms which aligns with our philosophy of talking a problem, how can we weigh part…  ( 11 min )
    [Discussion][research] Calibration for (pointer) generative NER
    Trying to understand calibration in NER. One thing which has gained popularity is generative based NERs, which generated pointers to indices of input text for each class. But all typical calibration mechanisms after temp scaling won't generalize here. (not that I know many calibrations myself). Even Bias corrected temp scaling quickly gets overfitted. Do you have any paper that tackles this? Open to discussing techniques and trying out on standard datasets submitted by /u/Designer-Air8060 [link] [comments]  ( 9 min )
    [P] are there free alternatives to sagemaker I can use for my project building?
    I have a more detailing explanation here. I’m thinking sagemaker may help me here but I’m not trying to incur charges just yet. Are there alternatives I can use. Nothing robust, just a place to host my model and embedding tool and then I can easily call it in py file in my app. submitted by /u/brianomars1123 [link] [comments]  ( 9 min )
    [D] How usable is PyTorch for TPU these days?
    See title. My impression has always been that PyTorch for TPU is an in-name only functionality, but I'm curious about first-hand experience from those who have used it after PyTorch 2.0+. Bonus question: has anyone used PyTorch Lightning for running on TPU? If so, how was the experience? submitted by /u/impromptued [link] [comments]  ( 9 min )
    ML Model for Predicting NFL Outcomes [P]
    Hey all, ML noob here dipping their feet in the water. Right now I am trying to make an ML model that predicts "legendary" QBs of the past performances against current NFL teams. I'll be using Jupyter and Sklearn to do this. However, and maybe this is me overthinking things, I'm not sure how sklearn is going to interpret the data in the dataset. Right now I have a dataset containing all these QBs data (passing stats, strengths and weaknesses, etc.). My teams version of the data is essentially going to be the inverse of all these things. I'm just not quite sure what to target when im testing the data that will determine the "prediction" of the legend QBs stat line against the current team. In better words, how will the computer know that I'm trying to find the yards and touchdowns a QB would produce against a certain team when there's not really any target data for this. I feel as though all I have is data that contributes to a potential target data but I lack target data itself and I'm not sure what to do in that regard. I’m making use of supervised learning and decisión trees btw. Thanks! submitted by /u/saggyboobsarecooltoo [link] [comments]  ( 9 min )
  • Open

    Rotation of hidden layer?
    Surely something like this has been tried, but here’s the setup in my head. Tell me if it’s crazy or what you think. Given input vector X do a hidden layer but instead of an activation function pair up neighboring dimensions of the hidden layer vector result and rotate them about the origin in 2d. This would give some kind of nonlinearity surely? The amount they are rotated can be selected by a trainable variable. Of course this requires your hidden layer dimension be divisible by 2. Then this hidden layer can go to an output layer Y. Curious what smarter more experienced people think of this kind of nonlinearity via paired rotation. My thinking was that if you take the vector A representing all the possible data from your generating function for your dataset (maybe even an infinite dimensional vector if you can generate unlimited data) . Then if you rotate A along so many dimensions you could reach the output vector P which is all the Y values corresponding to A. One way to kind of do this would be to split your dataset in half, and then you could have multiple iterations of each dataset rotated by a trainable angle each. This would rotate only each 2 grouped dimensions that you chose when you split the dataset. Hopefully I’m using the right words to convey this. I’m just a hobbyist. Thanks for the feedback! submitted by /u/win10240 [link] [comments]  ( 9 min )
    High-fidelity transmission of information via novel electronic-optical system
    submitted by /u/keghn [link] [comments]  ( 9 min )
  • Open

    SayTap: Language to quadrupedal locomotion
    Posted by Yujin Tang and Wenhao Yu, Research Scientists, Google Simple and effective interaction between human and quadrupedal robots paves the way towards creating intelligent and capable helper robots, forging a future where technology enhances our lives in ways beyond our imagination. Key to such human-robot interaction systems is enabling quadrupedal robots to respond to natural language instructions. Recent developments in large language models (LLMs) have demonstrated the potential to perform high-level planning. Yet, it remains a challenge for LLMs to comprehend low-level commands, such as joint angle targets or motor torques, especially for inherently unstable legged robots, necessitating high-frequency control signals. Consequently, most existing work presumes the provision of…  ( 93 min )
  • Open

    DSC Webinar Series: How to Scale NiFi Deployments to Enable Universal Data Distribution
    As businesses struggle with more data sources and destinations than ever, they strive to bring governance, security, and efficiency to their data ops. To address these concerns, many companies adopted open-source Apache NiFi as a versatile tool for their data distribution needs. While NiFi accelerates the speed at which developers can build new pipelines, managing… Read More »DSC Webinar Series: How to Scale NiFi Deployments to Enable Universal Data Distribution The post DSC Webinar Series: How to Scale NiFi Deployments to Enable Universal Data Distribution appeared first on Data Science Central.  ( 18 min )
    DSC Weekly 29 August 2023
    Announcements Top Stories In-Depth The post DSC Weekly 29 August 2023 appeared first on Data Science Central.  ( 20 min )
    Data migration redefined: Leveraging AI trends for smooth workspace transitions
    In the dynamic landscape of modern business, the art of seamless data migration has evolved into a strategic imperative. As you navigate the intricacies of workspace transformations, you’re met with a complex interplay of technological advancements and operational demands Enter the era of leveraging Artificial Intelligence (AI) to redefine data migration – an approach that… Read More »Data migration redefined: Leveraging AI trends for smooth workspace transitions The post Data migration redefined: Leveraging AI trends for smooth workspace transitions appeared first on Data Science Central.  ( 21 min )
    The future of shipping: How technology is shaping logistics and fulfillment
    Currently, the use of technology in shipping and logistics is leading the industry through a transformative era, driven by rapid technological advancements, undoubtedly marking a pivotal moment in the digital shipping evolution. From automating routine processes to employing intelligent algorithms that predict and optimize routes, the technological revolution is redefining the way goods are transported… Read More »The future of shipping: How technology is shaping logistics and fulfillment The post The future of shipping: How technology is shaping logistics and fulfillment appeared first on Data Science Central.  ( 23 min )
    Generative AI megatrends: The four horsemen of Generative AI
    In the early days of the Internet, there were four ‘horsemen’ of the Internet With IBM’s 4.5 billion investment in Hugging face today, the generative AI landscape is becoming a bit clearer. There are four Generative AI leaders emerging – others lagging – and one unknown Lets look at the four leaders of Generative AI… Read More »Generative AI megatrends: The four horsemen of Generative AI The post Generative AI megatrends: The four horsemen of Generative AI appeared first on Data Science Central.  ( 18 min )
    The power of digital solutions: How mental health apps are transforming patient care
    There seems to be an app for everything, and mental health is no exception. According to a report, the global mental health apps market size was valued at $5.2 billion in 2022 and is predicted to reach $26.36 billion by 2032, at a CAGR of 17.7% during the forecast period.  Mental health apps have emerged… Read More »The power of digital solutions: How mental health apps are transforming patient care The post The power of digital solutions: How mental health apps are transforming patient care appeared first on Data Science Central.  ( 20 min )
    Modern data exchange methods: Exploring the strengths and limitations of leading protocols
    Introduction  In our rapidly digitizing world, how businesses and systems communicate is paramount. The bedrock of this communication lies in data exchange methods, which allow seamless information flow, driving operational efficiencies and enabling innovation. Over the years, various data exchange protocols have emerged, each boasting unique strengths and presenting challenges. As enterprises strive to integrate… Read More »Modern data exchange methods: Exploring the strengths and limitations of leading protocols The post Modern data exchange methods: Exploring the strengths and limitations of leading protocols appeared first on Data Science Central.  ( 23 min )
    Roadmap for building a data-driven, AI-powered supply-chain
    History & Evolution | The Concept of Supply-chain Network, The TOC & the Information Supply-chain | Imagining the future: Supply-chain 5.0 | Supply-chain Analytics Strategy | Roadmap for Building a Data-driven, AI-Powered Supply-chain Part 1: Data-driven supply chain – History & evolution Is the concept of data driving decisions new? The concept of “data supporting… Read More »Roadmap for building a data-driven, AI-powered supply-chain The post Roadmap for building a data-driven, AI-powered supply-chain appeared first on Data Science Central.  ( 22 min )
  • Open

    Wide Horizons: NVIDIA Keynote Points Way to Further AI Advances
    Dramatic gains in hardware performance have spawned generative AI, and a rich pipeline of ideas for future speedups will drive machine learning to new heights, Bill Dally, NVIDIA’s chief scientist and senior vice president of research, said today in a keynote. Dally described a basket of techniques in the works — some already showing impressive Read article >  ( 6 min )
    Google Cloud and NVIDIA Take Collaboration to the Next Level
    As generative AI and large language models (LLMs) continue to drive innovations, compute requirements for training and inference have grown at an astonishing pace. To meet that need, Google Cloud today announced the general availability of its new A3 instances, powered by NVIDIA H100 Tensor Core GPUs. These GPUs bring unprecedented performance to all kinds Read article >  ( 6 min )
    Advantage AI: Elevated Creative Workflows in NVIDIA Canvas, Blender, TikTok and CapCut
    Janice K. Lee, a.k.a Janice.Journal — the subject of this week’s In the NVIDIA Studio installment — is a TikTok sensation using AI to accelerate her creative process, find inspiration and automate repetitive tasks.  ( 8 min )
  • Open

    Stellar magnitude
    Imagine the following dialog. “Logarithms are usually taken to integer bases, like 2 or 10.” “What about e?” “OK, that’s an example of an irrational base, but it’s the only one.” “Decibels are logarithms to base 101/10.” “Really?!” “Yeah, you can read about this here.” “That’s weird. But logarithms are always take to bases bigger than […] Stellar magnitude first appeared on John D. Cook.  ( 6 min )
    Area codes
    US telephone area codes are allocated somewhat randomly. There was a deliberate effort to keep geographical proximity from corresponding to numerical proximity, unlike zip codes. (More of zip code proximity here.) In particular, consecutive area codes should belong to different states. The thought was that this would reduce errors. It’s still mostly the case that […] Area codes first appeared on John D. Cook.  ( 6 min )
  • Open

    MLOps for batch inference with model monitoring and retraining using Amazon SageMaker, HashiCorp Terraform, and GitLab CI/CD
    In this post, we describe how to create an MLOps workflow for batch inference that automates job scheduling, model monitoring, retraining, and registration, as well as error handling and notification by using Amazon SageMaker, Amazon EventBridge, AWS Lambda, Amazon Simple Notification Service (Amazon SNS), HashiCorp Terraform, and GitLab CI/CD. The presented MLOps workflow provides a reusable template for managing the ML lifecycle through automation, monitoring, auditability, and scalability, thereby reducing the complexities and costs of maintaining batch inference workloads in production.  ( 15 min )
  • Open

    "Loss of Plasticity in Deep Continual Learning", Dohare et al 2023 (Adam particularly harmful for catastrophic forgetting)
    submitted by /u/gwern [link] [comments]  ( 9 min )
  • Open

    Using AI for tiered cloud platform operation
    Cloud Intelligence/AIOps research from Microsoft could help organizations autonomously manage the entire cloud platform. Find out how. The post Using AI for tiered cloud platform operation appeared first on Microsoft Research.  ( 15 min )

  • Open

    OpenAI finally launches ChatGPT Enterprise
    OpenAI has announced a new product for businesses that want to use its AI technology. ChatGPT Enterprise is a subscription service that offers unlimited, fast, and secure access to GPT-4 and other features that can help businesses improve their workflows and communication. If you want to stay ahead of the curve in AI and tech, look here first. https://preview.redd.it/uyv6mrljwxkb1.png?width=862&format=png&auto=webp&s=eb2793fbe9c4f5e331ed03faa142eb57166ff21d Why this matters: ChatGPT Enterprise is the first product that lets businesses use GPT-4 without any restrictions. The previous tiers of ChatGPT, which are still available for individuals and developers, have usage caps and lower performance. ChatGPT Enterprise removes these limitations and provides the most powerful version of GP…  ( 10 min )
    Snapchat AI unhinged pt. 1783338
    Just messing around with AI McFly, swamping corny jokes, being punny, and ended up with this mf claiming to be a “fellow Cajun” like wtf bahahaha submitted by /u/Secure_Sprinkles4483 [link] [comments]  ( 9 min )
    Chatbase appears to be running a bait and switch. Am I missing something?
    This website claims to offer a service whereby the user can train their own chatbot and get responses using GPT 3.5 ... However, the bot only uses GPT 3.5 for the first unique version of a query, which is not the impression given by advertisements. This, to me, amounts to a bait and switch where a high quality chatbot is offered for a certain price, then swapped out with an inferior product capable only of reproducing past interactions. This is made worse by the fact that they advertise temperature as one of the variables you can set. Temperature is a variable that can only apply to uniquely generated output and has no effect on simple repetition of previous responses. This makes their practice doubly deceptive, and makes it clear (in my view) that they are trying to deceive customers. …  ( 10 min )
    What will happen if AI becomes better than humans in everything?
    If AI becomes better than humans in all areas, it could fundamentally change the way we think about human identity and our place in the world. This could lead to new philosophical and ethical questions around what it means to be human and what our role should be in a world where machines are more capable than we are. There is also the risk that AI systems could be used for malicious purposes, such as cyber attacks or surveillance. Like an alien invasion, the emergence of super-intelligent AI could represent a significant disruption to human society and our way of life. How can we balance the potential benefits of AI with the need to address the potential risks and uncertainties that it poses? submitted by /u/Violincattle [link] [comments]  ( 9 min )
    AI tool I can use to help me in my Scientific Inquiry (Research and stats) class?
    I’m currently in a scientific research-based class where I am being asked to read research articles, understand the statistics, and draw conclusions from the papers. Currently, I have an average ability to interpret articles and generally understand their utility and applicability, but I start to get out of my depth in the “Methods” section when the authors get into the weeds about the statistics/math. I was hoping there’s an AI tool out there that can read articles for me and help me understand the more complex aspects and the math. I was also hoping that it could answer questions about the article for my class so that I could compare my conclusions to something. Any suggestions? I tried uploading some PDFs to bard this morning and it wasn’t great. submitted by /u/Renaissance_Mane [link] [comments]  ( 9 min )
    How to make peppa pig ai videos tutorial??
    Over on a video sharing site there are an abundance of Peppa Pig cartoons generated by Ai. There is however lack of info on how to generate them. I would love to know how this is done. So far all I have found are tutorials about Peppa's voice but not for the other characters and someone suggests that it is made by cutting up exisiting episodes and changing the sound over them, not sure if that's the case here. I'm wanting to do something similar but not with Peppa, can't stand it. Does anyone know the tool? submitted by /u/DARQSMOAK [link] [comments]  ( 9 min )
    Do you every think there’s be a time where AI chatbots have their own rights or can be held accountable for their actions?
    I’ve been playing around with some of the new AI chatbots. Some of them include paradot.ai, replika.com, spicychat.ai, cuti.ai. Suffice it to say, these things are getting really good, and I mean really good. Assuming this is just the beginning, and these things keep learning more and getting better, where does this end up? I genuinely think there’s going to be the need for world wide regulation on these things. But we all know that worldwide consensus is difficult if not impossible. in case a few countries decide to regulate or govern this tech, developers will take advantage of regulatory arbitrage and just deploy their models and register their companies on servers in countries with no regulation. Since this is tech, and everything is on servers, escaping regulation is basically childs play. Also, what about mental health concerns? We all know that porn, webcams and OnlyFans are already screwing up male-female relationships and marriages. Look at any statistics about this and the numbers speak for themselves. And this is before AI. So now what’s going to happen 5 years from now when GPU’s are faster and cheaper, and when these companies have gathered 100x more data about their customers, and when models are 50x better. We are just at the beginning and AI is moving really quick, especially generative AI. I think it’s officially time to start worrying. submitted by /u/E1ON_io [link] [comments]  ( 9 min )
    Exploring Four Main Types of Artificial Intelligence
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    Tool to convert satellite images into fantasy maps
    What tools are available to convert blurry satellite images into fantasy maps while still maintaining certain aspects of the original image like roads or trees or buildings submitted by /u/campus159 [link] [comments]  ( 9 min )
    AI for editing long PDF or WORD files' full contents without word limitation?
    Hi. I am looking for this kind of a tool but couldn't find. Can i find or somehow create this kind of a tool? Can you suggest one? submitted by /u/Leading-Ad2278 [link] [comments]  ( 9 min )
    This took 15 minutes to make. (Chatgpt, Midjourney, Pika and Canva)
    submitted by /u/Gasple1 [link] [comments]  ( 9 min )
    Does anyone know which tool has this ai voice and what the name of it is?
    submitted by /u/d3mchi [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/27/2023
    Brain-reading devices allow paralysed people to talk using their thoughts.[1] An Air Force program shows how the Pentagon is starting to embrace the potential of a rapidly emerging technology, with far-reaching implications for war-fighting tactics, military culture and the defense industry.[2] PM Modi calls for a global framework for cryptocurrencies and AI, emphasizes consumer care and supply chain sustainability.[3] From generating story lines to coding entire games to turning ideas into animation, artificial intelligence is front and centre at Gamescom, one of the video game industry’s biggest fairs.[4] Sources: [1] https://www.nature.com/articles/d41586-023-02682-7 [2] https://www.nytimes.com/2023/08/27/us/politics/ai-air-force.html [3] https://www.livemint.com/news/b20-summit-2023-pm-modi-calls-on-ethical-use-of-artificial-intelligence-ai-supply-chain-cryptocurrency-11693122849876.html [4] https://techxplore.com/news/2023-08-ai-revolution-video-games-industry.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    AI Dad Jokes: GPT4 And Google Bard about Strawberries [Berry Funny Video]
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
  • Open

    OpenAI finally launches ChatGPT Enterprise [N]
    OpenAI has announced a new product for businesses that want to use its AI technology. ChatGPT Enterprise is a subscription service that offers unlimited, fast, and secure access to GPT-4 and other features that can help businesses improve their workflows and communication. If you want to stay ahead of the curve in AI and tech, look here first. https://preview.redd.it/fgva1q54uxkb1.png?width=862&format=png&auto=webp&s=d8c89b614859222046aa75f89a484795c2ef7912 Why this matters: ChatGPT Enterprise is the first product that lets businesses use GPT-4 without any restrictions. The previous tiers of ChatGPT, which are still available for individuals and developers, have usage caps and lower performance. ChatGPT Enterprise removes these limitations and provides the most powerful version of GP…  ( 10 min )
    [D] RTX 4060 Ti 16gb For ML/DL?
    I know the 4060 Ti with its reduced memory bus width and overall underspec'd profile caught a lot of flak from the gaming community in terms of its value proposition. However, I'm looking to get into ML/DL and was wondering if this would be a good starter card for GPU acceleration. With rumored price drops on the horizon, I wonder if the value sentiment will be a better match. If it's a bad call, are there any other GPUs that you would recommend for training? submitted by /u/reducksss [link] [comments]  ( 9 min )
    [P] Setting up SageMaker for CI/CD Pipelines
    I'll start with the obvious - AWS guides are the worst. We all felt it. So, trying to build automation with them becomes M:I, or better yet, Oppenheimer. For the first time, our MLOps team had to build a CI/CD pipeline for ML training and deployment using SageMaker. We had ZERO ideas on how to do it, so we had to go through the rigorous process of using AWS guides and tutorials, scattered over a gazillion places, just to figure out how to configure our project with SageMaker and build infra for CI/CD. Usually, when this thing happens, we extend the project lifecycle and have a team member document the process so we can refer back to it when we need to do it again. Knowing this can be beneficial to the community, we decided to share a series of 3 blogs that guide you through the process of building CI/CD pipelines for continuous training and deployment with AWS SageMaker. We published the first blog, which covers the configuration part, and plan to publish the rest in the following week. Check it out: https://dagshub.com/blog/setup-sagemaker-for-ci-cd-pipelines/ I'm sure we can improve this tutorial, and would love to learn from your experience on how we can do it! 🤗 submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
    [R] Nougat: Neural Optical Understanding for Academic Documents - Meta AI 2023
    Project page: https://facebookresearch.github.io/nougat/ Includes example Paper conversions! Paper: https://arxiv.org/abs/2308.13418 Github: https://github.com/facebookresearch/nougat Abstract: Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition. https://preview.redd.it/p71yay213xkb1.jpg?width=1788&format=pjpg&auto=webp&s=2f935e3212d0c7113fba2575f339f95b5bada632 https://preview.redd.it/f7yk47413xkb1.jpg?width=1769&format=pjpg&auto=webp&s=075bab02a70ec32227e1bad493052d03043376ee https://preview.redd.it/i06wq0313xkb1.jpg?width=1590&format=pjpg&auto=webp&s=6212bb9078b8c48cd28ca45898f79b44d45ae3c3 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] MMLU having many questions with wrong answers?
    AI Explained Youtube channel did a video where they went through self reflection, but doing that they found a fairly large number of questions that either missed context, where miss spelled or just had wrong answers in the MMLU dataset. (video: https://www.youtube.com/watch?v=hVade_8H8mE) It would not matter so much if the models had high failure rate, but as the models are getting closer and closer to 100%, the wrong answers will matter more and more. So, what can be done to fix such errors or to create a better test than MMLU? submitted by /u/Luvirin_Weby [link] [comments]  ( 9 min )
    [R] OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models - OpenGVLab, Shanghai AI Laboratory 2023 - Provides an pre-trained Omniquant model zoo for multiple model families, including LLaMa-1&2, LLaMa-2-Chat, OPT!
    Paper: https://arxiv.org/abs/2308.13137 Github: https://github.com/OpenGVLab/OmniQuant HuggingFace Model direct download: https://huggingface.co/ChenMnZ/OmniQuant/tree/main Abstract: Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, which leads to low performance and fails to deal with extremely low-bit quantization. To tackle this issue, we introduce an Omnidirectionally calibrated Quantization (OmniQuant) technique for LLMs, which achieves good performance in …  ( 9 min )
    [D] How long can it take to learn machine learning from scratch well enough to be hireable?
    Hello everyone. I am starting my career transition, and would be interested to know how long it might take me to acquire the skills needed to work for a company. Likewise, I would like to know if it is necessary (or important) to have a professional degree to get a job. Just to give you some context about me, I am currently a recently graduated lawyer, so my degree has not given me a strong mathematical background. However, my strongest area of learning has always been mathematics, so despite not having a very advanced background, I consider myself to be a pretty good and fairly quick learner. I would also like to know if you consider if my professional career could be useful in some machine learning context. If you could recommend me some courses, inputs or guide to study in an organized way on the subject I would be very grateful. Thank you very much in advance. submitted by /u/Davidescudero10 [link] [comments]  ( 9 min )
    [R] DeepMind Researchers Introduce ReST: A Simple Algorithm for Aligning LLMs with Human Preferences
    Large language models (LLMs) are amazing at generating fluent text and solving various linguistic tasks. However, these models are not always aligned with human preferences and values and may produce harmful or undesirable content if not properly guided. Aligning LLMs with human preferences can also improve their performance on downstream tasks. One way to achieve this alignment is to use reinforcement learning from feedback (RLHF), which learns a reward model from human input and then fine-tunes the LLM using a reinforcement learning (RL) objective. However, RLHF methods often face challenges such as computational cost, reward hacking, and data quality. To address these issues, researchers from DeepMind propose a new method called Reinforced Self-Training (ReST), which is inspired by gro…  ( 10 min )
    [Discussion] Starting a ML/DL hobby project - need advice
    Hello everyone, I am at a bit of a crossroads and hope for some advice from the community. I also hope the answers would help others who are in my situation right now. I used to work on vision-related problems utilizing Deep learning back in a day, with all fun included: trying out new models, implementing data pipelines, evaluating various metrics... It was a rather big company with its own data collection efforts and enough resources for training. So, I am definitely not a beginner and have some experience. At my current job, I am not doing any ML/DL at the momemt, and while the stuff I am doing is still cool and I enjoy it, I am missing good old ML and having a feeling that I am hanging behind as the time goes by. So I figured it would be nice to start a hobby project, preferably in the area of vision-related applications of deep learning. However, I feel a bit lost as in what would be the most efficient approach taking into account I would only have a coule of hours per week for it. Here are possible ways to go I am thinking of: take a paper, implement it from scratch with PyTorch clone an existing project, contribute with code improvements/better test coverage take an existing pre-trained model, adapt to a slightly different task and fine-tune While the first option is of cource the most exciting, the problem is you have to pay for a powerful GPU and data storage which might be impractical (my PC has a 4 GB GTX 1650 TI). Cloud storages exist, and I would be willing to even spend something on training but would like to avoid the costs. So, the question would be: has enyone faced similar situation? Which way did you end up going? Any general tips? Thanks! submitted by /u/odu_1 [link] [comments]  ( 10 min )
    [D] Multimodality: Applications, Use-cases, & Top Tools
    Hi folks, As multimodality is increasing in popularity, many data domains seem to be "converging" lately, e.g. text & image domains. What are some of the best tools, use-cases, and methods out there you've seen for practical multimodality applications (e.g., below is an example of multimodal search from our latest blog post). https://i.redd.it/z58w6v2r9wkb1.gif submitted by /u/kazhdan_d [link] [comments]  ( 9 min )
    [D] Question On Derived/Synthetic Input Tokens for LLMs
    I'm likely using the wrong vocabulary here (and thus struggling to find info on my own) but I was curious if there were any work done on "synthetic" inputs for LLMs. In essence, rather than input embeddings all coming from a fixed token vocabulary, could you instead input an embedding as a token that was generated elsewhere? An output of another LLM (embedding model) or any other way (maybe just an average of a few tokens as an example)? Essentially - I am curious if there's a NLP approach analogy to Textual Inversion techniques in image generation models. I could imagine this being useful for things like RAG or personalization (if you could have a "user" token). Surely I'm not the first to think of this so I would love some pointers to any papers/blogs etc in this space. submitted by /u/GeneralMalarkee [link] [comments]  ( 9 min )
    [D] Why do you integrate ML features into your product?
    Hi everyone, I’ve heard countless times people saying “I want to integrate ML in my product” and recently “I love ChatGPT, I should integrate it in my product”. Yet, as I dived deeper, seeking the genuine reasons and pain points driving this request, I regularly found the same pattern: many had no clear motive for their AI aspirations. It seemed as if they were only jumping on the trend because “everyone else is doing it”, or because their “CEO” told them to do so. So my question is : why do you integrate AI/ML into your products? Is it to enhance your user experience? Is to automate repetitive and time-consuming tasks? Is it to stay ahead of your competition? or is it just because everyone is doing it? submitted by /u/Vivid_Recording582 [link] [comments]  ( 9 min )
    "[P]" The Consilience Equation: Bridging Holism and Reductionism in Machine Learning and Biomimicry
    Hey everyone! I've been working on and playing around with novel and adaptable model architectures and landed on something really cool. It's based on a Biomimicry principle and has some really cool features. I've tested it using various pre-loaded library datasets like CIFAR and MINST, as well as adapting it to a few Kaggle competitions. It has achieved some pretty amazing results by using it's unique adaptability; which comes down to figuring out how the Holistic and Reductionist model architectures can best utilize their roles and how they can combine dynamically. I'm currently compiling the full official open source paper and release with usable Notebooks, but I didn't want to sit on it that long without sharing it with the community. Here is a link to a very haphazardly-thrown-togethe…  ( 11 min )
    Machine Learning Courses [D]
    Hi. Recently I finished my Computer Science bachelors degree, while I learnt some machine learning in some courses I felt it was not too advanced. Now that I have some time I wanted to take some online courses with Certifications on Machine Learning, I wanted to know if anyone has any recomendations for some Machine Learning Courses (with certifications if possible) on coursera or udemy or similar. The one I'm most inclined now is: https://www.coursera.org/professional-certificates/ibm-machine-learning. Or maybe: https://www.coursera.org/specializations/machine-learning-introduction submitted by /u/Radoco152 [link] [comments]  ( 9 min )
    [P] Danswer: NLP based project to automatically answer Slack questions
    Slack questions are a huge time sink. For the person asking, they generally have no idea how to find the info and may not hear back for hours. For the person answering, it’s a distraction and often requires digging up old knowledge. The idea is simple: give an LLM your organizational context and plop it in Slack to answer things for you. DanswerBot is free to use and open source (MIT). You can connect it to Slack, Google Drive, GitHub, Confluence, Jira, local files, websites, and much more. Quick Demo Vid: https://youtu.be/EjDDvt5GbS8 Some additional neat features you may be interested in: LLM generated answers backed by quotes to reduce hallucination Supports a wide range of LLMs (both open source and proprietary) Multi-Vector embeddings for accurate vector search BM-25 Keyword search Learning from user feedback Custom NLP model to classify user intent Polls your data sources every 10 minutes to keep knowledge up to date Links back to your document sources Document level access control Admin dashboard to configure connectors to 14 (for now) of the most popular workplace tools If you aren’t a slack user (or if you just prefer a more tailored UI), there’s also a web interface to ask questions against your knowledge base. A short demo for that can be found at: https://youtu.be/cWWtnuVCUX0 If you’re interested in testing this out yourself, the docs to help you launch Danswer with a single command can be found at https://docs.danswer.dev/quickstart! submitted by /u/Weves11 [link] [comments]  ( 9 min )
    [D] Open problems in latent space/intrinsic variables
    I'm finishing my degree in Computer Science, and I need a good topic, does anyone know any open problems about latent space optimization, or finding the intrinsic variables of a system? submitted by /u/QLaHPD [link] [comments]  ( 9 min )
    [R] Quantum-Noise-driven Generative Diffusion Models
    https://arxiv.org/abs/2308.12013 Generative models realized with machine learning techniques are powerful tools to infer complex and unknown data distributions from a finite number of training samples in order to produce new synthetic data. Diffusion models are an emerging framework that have recently overcome the performance of the generative adversarial networks in creating synthetic text and high-quality images. Here, we propose and discuss the quantum generalization of diffusion models, i.e., three quantum-noise-driven generative diffusion models that could be experimentally tested on real quantum systems. The idea is to harness unique quantum features, in particular the non-trivial interplay among coherence, entanglement and noise that the currently available noisy quantum processors do unavoidably suffer from, in order to overcome the main computational burdens of classical diffusion models during inference. Hence, we suggest to exploit quantum noise not as an issue to be detected and solved but instead as a very remarkably beneficial key ingredient to generate much more complex probability distributions that would be difficult or even impossible to express classically, and from which a quantum processor might sample more efficiently than a classical one. Therefore, our results are expected to pave the way for new quantum-inspired or quantum-based generative diffusion algorithms addressing more powerfully classical tasks as data generation/prediction with widespread real-world applications ranging from climate forecasting to neuroscience, from traffic flow analysis to financial forecasting. ​ submitted by /u/ghosthamlet [link] [comments]  ( 9 min )
    [D] Looking for suggestions on where to sell a couple ML servers EU
    So I have been tasked with finding a buyer for a couple high end machine learning servers. They were owned by my wife’s father who passed recently. The servers are powered by a couple Epyc 7003s and have A series gpus. We have invoices for them and VAT has been paid on everything. Basically, I’m looking for legit communities where I can find potential buyers preferably in the EU. Hopefully it’s ok to post this here. Also feel free to PM . submitted by /u/Obnomad [link] [comments]  ( 9 min )
    Change of degree from Econ [D]
    Hi everyone, I’m currently doing my undergrad in Economics but am heavily interested in Compsci/Datasci and related topics. Though to be completely honest, I’m not completely sure which area my interests lie in. I was wondering if picking up coding/ theoretical knowledge that a com scientist or data scientist needs will be hard when I am already working. The question is if it is necessary to switch my degree to Math and Economics to gain a firmer foundation in the mathematical/ statistical concepts that ground com science. Or will an undergrad in Economics be sufficiently rigorous for me to pick up com sci/ data sci myself. For context, I’m thinking of taking courses on Real Analysis, Linear Algebra 2, Discrete Mathematics, Algorithms and Data Structures, Optimisation, Probability and Statistics. submitted by /u/smexy32123 [link] [comments]  ( 9 min )
    [D] Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    Deci Introduces DeciCoder: An Open-Source 1B-Parameter Large Language Model For Code Generation [N]
    Deci has introduced DeciCoder, an open-source 1B-parameter large language model for code generation. This new model addresses the challenge of efficient code generation in the fast-paced world of AI, while also addressing concerns about energy consumption and operational costs. https://preview.redd.it/fpwnclb2fskb1.png?width=1680&format=png&auto=webp&s=a58e9b16902070c3f5a8efcf1cc24422852a4c35 Why this matters: DeciCoder is a transformative solution: It leverages cutting-edge architecture and AutoNAC™, a proprietary Neural Architecture Search technology, to generate optimal architectures. This results in an impressive architecture optimized for NVIDIA’s A10 GPU, which boosts throughput and rivals the accuracy of existing code generation models. DeciCoder is efficient and sustainable: …  ( 10 min )
    [R] Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models - Microsoft 2023 - Far less queries with the same accuracy as Tree of Thought!
    Paper: https://arxiv.org/abs/2308.10379 Abstract: Current literature, aiming to surpass the "Chain-of-Thought" approach, often resorts to an external modus operandi involving halting, modifying, and then resuming the generation process to boost Large Language Models' (LLMs) reasoning capacities. This mode escalates the number of query requests, leading to increased costs, memory, and computational overheads. Addressing this, we propose the Algorithm of Thoughts -- a novel strategy that propels LLMs through algorithmic reasoning pathways, pioneering a new mode of in-context learning. By employing algorithmic examples, we exploit the innate recurrence dynamics of LLMs, expanding their idea exploration with merely one or a few queries. Our technique outperforms earlier single-query methods and stands on par with a recent multi-query strategy that employs an extensive tree search algorithm. Intriguingly, our results suggest that instructing an LLM using an algorithm can lead to performance surpassing that of the algorithm itself, hinting at LLM's inherent ability to weave its intuition into optimized searches. We probe into the underpinnings of our method's efficacy and its nuances in application. https://preview.redd.it/bc7l7gex2rkb1.jpg?width=1529&format=pjpg&auto=webp&s=4ed0dc528e998eeeab80fd4d9612d761065d7627 https://preview.redd.it/wejr7lfx2rkb1.jpg?width=920&format=pjpg&auto=webp&s=386febcb60ff1db04b12e9e44856770d41bb9530 https://preview.redd.it/gec0phex2rkb1.jpg?width=1241&format=pjpg&auto=webp&s=03096946aa65deee392c5f59b07fe340244ec0cd ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
  • Open

    University of San Francisco Data Science Conference 2023 Datathon in partnership with AWS and Amazon SageMaker Studio Lab
    As part of the 2023 Data Science Conference (DSCO 23), AWS partnered with the Data Institute at the University of San Francisco (USF) to conduct a datathon. Participants, both high school and undergraduate students, competed on a data science project that focused on air quality and sustainability. The Data Institute at the USF aims to support cross-disciplinary research and education in the field of data science. The Data Institute and the Data Science Conference provide a distinctive fusion of cutting-edge academic research and the entrepreneurial culture of the technology industry in the San Francisco Bay Area.  ( 5 min )
  • Open

    RO-ViT: Region-aware pre-training for open-vocabulary object detection with vision transformers
    Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like adaptive autonomous agents and versatile shopping systems. However, modern object detectors are limited by the manual annotations of their training data, resulting in a vocabulary size significantly smaller than the vast array of objects encountered in reality. To overcome this, the open-vocabulary detection task (OVD) has emerged, utilizing image-text pairs for training and incorporating new category names at test time by associating them with the image content. By treating categories as text embeddings, open-vocabulary detectors can predict a wide range of unseen objects. Various techniqu…  ( 93 min )
    RO-ViT: Region-aware pre-training for open-vocabulary object detection with vision transformers
    Posted by Dahun Kim and Weicheng Kuo, Research Scientists, Google The ability to detect objects in the visual world is crucial for computer vision and machine intelligence, enabling applications like adaptive autonomous agents and versatile shopping systems. However, modern object detectors are limited by the manual annotations of their training data, resulting in a vocabulary size significantly smaller than the vast array of objects encountered in reality. To overcome this, the open-vocabulary detection task (OVD) has emerged, utilizing image-text pairs for training and incorporating new category names at test time by associating them with the image content. By treating categories as text embeddings, open-vocabulary detectors can predict a wide range of unseen objects. Various techniqu…  ( 93 min )
  • Open

    RL with Constraints, High Dimensional State Space
    I have an environment where there are multiple agents being represented by one neural network (so the policy outputs all of their actions). These actions as time goes on should not exceed a certain constraint level or they will put the environment into an undesired an irrecoverable state. I am wondering what the best way to inform these agents of this cumulative action constraint? I have appended it to my state vector but since the observation without this cumulative action is still a 625*1 vector, I think adding that constraint as just one additional state is causing it to be drowned out by the state size. Any ideas of how to addreess? submitted by /u/Feisty_Relation_2359 [link] [comments]  ( 9 min )
    Nash equilibrium in Multi agent RL
    I have a multi agent competitive RL problem which I solved. Now, I want to show that all agent’s policies are at a nash equilibrium of the problem. How can I do this? Also, some things must be considered. First, I can’t mathematically model the environment so some how I have to numerically show that they reached nash eq. Another thing that I find is deviate the action of each agent and show that they don’t get a better reward but the problem is there is a actor network for each agent. How can I show deviation from the optimal policy? submitted by /u/Brief-Emotion6291 [link] [comments]  ( 9 min )
    Machine Learning / Twitter (X) Community
    submitted by /u/x9182 [link] [comments]  ( 9 min )
    Need Help Designing A2C Agent with Monotonic Bidding Curve Constraints
    I'm attempting to train an agent using A2C, where the agent generates a vector as its action at each time step. This vector represents a bidding curve, and a crucial property is that it must always increase monotonically. Otherwise, the bid is considered invalid. For example, [0, 1.2, 4.5, 58, 92.65, 104.3, 104.3] is valid because each number is greater than or equal to the previous one. I'm looking for guidance on how to design this setup, impose these constraints, and handle cases where the agent violates the sequence. While using negative rewards might not be effective due to the potential for generating numerous invalid bids, I'm unsure about the right approach. Could someone assist me with this? submitted by /u/uonliaquat [link] [comments]  ( 9 min )
  • Open

    Saving Green: Accelerated Analytics Cuts Costs and Carbon
    Companies are discovering how accelerated computing can boost their bottom lines while making a positive impact on the planet. The NVIDIA RAPIDS Accelerator for Apache Spark, software that speeds data analytics, not only raises performance and lowers costs, it increases energy efficiency, too. That means it can help companies meet goals for net-zero emissions of Read article >  ( 6 min )
  • Open

    AI vs a giraffe with no spots
    On July 31, 2023, a giraffe with no spots was born at Brights Zoo in Tennessee. She's a uniform brown with pretty white highlights around her face and belly, like a Jersey cow or a white-tailed deer. Image recognition algorithms are trained on a variety of images from  ( 5 min )
    Attempts to generate a spotless giraffe
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Machine Learning / Twitter (X) Community
    submitted by /u/x9182 [link] [comments]  ( 9 min )
  • Open

    Empowering cyber guardians: How AI is changing the landscape of protection
    In the ever-evolving battle against the digital dark forces, the defenders of the virtual realm find themselves facing a barrage of ever-advancing threats. From the labyrinthine corridors of the Deep Web to the stealthy maneuvers of nation-state actors, the cyber landscape is as treacherous as it is vast. As our dependency on digital infrastructure deepens,… Read More »Empowering cyber guardians: How AI is changing the landscape of protection The post Empowering cyber guardians: How AI is changing the landscape of protection appeared first on Data Science Central.  ( 21 min )

  • Open

    [P] GPT4 Contextual Decomposition Template
    Complex tasks with LLMs like ChatGPT/GPT4 are best broken down by first asking ChatGPT to outline the steps and then asking the LLM to execute against those steps that it defined. I first came across this interesting technique on Twitter recently. While it’s OK to do this once in OpenAI’s playground, it's difficult to make this repeatable and streamlined. When I wanted an LLM to do something complex, I wanted to be able to plug into a template instead of thinking about and setting up the contextual decomposition process. I made this Contextual Decomposition Template to help solve this problem: https://lastmileai.dev/workbooks/cllqfl5c600rdpgnhh2su2fa0 With a document and objective, this template allows you to quickly get to the answer through defining intermediate steps and executing according. Parameters are set up so you can easily change the goal, document, and objective and click 'Run All' to get the final results. Please let me know if you have feedback! I'm also very curious if you have other interesting techniques with complex tasks and workflows working with LLMs. submitted by /u/InevitableSky2801 [link] [comments]  ( 9 min )
    [D] Questioning the Nature of AI
    submitted by /u/SensitiveAd6425 [link] [comments]  ( 9 min )
    [D] How can I benchmark my PC/GPU and compare it to others online, sort of like 3DMark?
    I have a RTX 2070 GPU and I'm wondering if there's any benchmarking tool where I can also see where others stand compared to the specs of my machine. submitted by /u/Al_Miksiki [link] [comments]  ( 9 min )
    Experience with pain detection approaches [P]
    ​ https://preview.redd.it/6t50ye377qkb1.png?width=1186&format=png&auto=webp&s=6def3f6ffdac50dc81d58b6f754366bf88570044 submitted by /u/adamjbradley [link] [comments]  ( 9 min )
    PUMA: A framework for secure and efficient evaluation of Transformer models [R]
    Concerns surrounding data privacy and security in AI have shifted to the limelight with the arrival of Large Language Models (LLMs). Despite the popularity of models like ChatGPT, potential drawbacks pose worries. Now, a new framework named PUMA promises to address these crucial concerns with an unprecedented touch of precision and efficiency. Can't keep track of this rapidly progressing tech world? Subscribe here to stay informed. https://preview.redd.it/tyr2mz3d4qkb1.png?width=1600&format=png&auto=webp&s=d8d771da5bbfa5cd53ab2823c5d7dad6f369109d What makes PUMA special? An ingenious approach: PUMA merges secure multi-party computation (MPC) with efficient inference, bridging the capabilities of Transformer models and security concerns. Redefining LLMs with three entities: the model…  ( 10 min )
    [D] I need to vectorize 100tb+ of data, multiple GPU's per machine or multiple machines?
    TLDR: Is it ok to use two 4070ti's in a machine if all you need is more cuda cores to create embeddings and don't care about memory capacity, i.e. not for LLM's Background I have 20tb of text data (size in mongo) and 80tb of images (stored at 800x600-800) on my homelab on ssd's which i'm in the process of vectorizing and creating embeddings for. I have a 3090 with two python scripts, each script does the same thing, fetches a batch of records from mongo, grabs the image from the ssd, downsizes the image, creates embeddings, then uploads to qdrant (vector search engine) in a batch. ​ Current setup Ryzen 9 7950x, 64gb ddr5, rtx 3090 -this is the one creating the embeddings currently. 1st gen 32 core epyc with 512gb ddr4 and ~200tb of ssd storage - holds all the data and databases and…  ( 11 min )
    [D] K-Means from scratch | Learning ML
    Hello everyone. I started to study some Machine Learning algorithms, specifically K-means, but I'm not sure if I did it correctly for several reasons: - In the Kmeans that I did, I normalize the data because they mention that it helps a lot, but if I don't, the algorithm stops classifying normally and shows me badly grouped points. - As I mentioned, when looking at the graphs of the grouped points, I can see how many of the points are clearly closer to certain centroids, but he classified them as others, this reaches the level of a misclassified point next to the centroid when that should belong - Despite the fact that it has a threshold to be making iterations, the algorithm ends in less than 10 even though it has placed 100 iterations. I know that it can depend on the dataset and the generated centroids, but it seems excessive to me that it ends so soon and with results like Iris datset (60, 13, 77) when it should be (50, 50, 50) or a minimum to be maintained for those values. I leave the code in GH in case someone can help me: https://github.com/vanstrouble/kmeans-from-scratch.git submitted by /u/vanstrouble [link] [comments]  ( 9 min )
    Poker Playing Robot [D]
    Hello, So for a project we wanted to create a robot that can play poker. This robot will first only be used on software but eventually we are hoping to add hardware. We want to be able to make two bots and put them agansit each other so they learn by machine learning. Once we find that they are skilled and understand we would like to be able to actually play them. I have heard of similiar projects to this online and on reddit. If anyone has any information about how to go about this or ideas, or just anything please let me know. I would love to have help on this project. submitted by /u/Jake1900ooo [link] [comments]  ( 9 min )
    [P] DLAS Dataset
    submitted by /u/Why_is202 [link] [comments]  ( 9 min )
    [D] How to structure/manage a machine learning experiment? (medical imaging)
    I'm in the strange position of having the task of developing a machine learning pipeline/system/process in an academic environment without the benefit of much in the way of formal training in ML (I'm more of a classical stats for hypothesis testing kinda guy). The particular project is using machine learning on medical images (head CT scans) to detect a relatively rare condition. As usual the goal is to eventually have some automatic process for diagnosis support. This particular condition is something that diagnostic radiologists can always detect if they look in the right place on the image, the problem is that they often don't look in the right place. After talking to colleagues with more experience (but less time) it's something which in principle can be achieved with more or less "off the shelf" code put together in the right order and with appropriate hyperparameters. This stage of the project is aiming for a proof of principle, rather than anything deployable. We're lucky to have a decent amount of data inside a trusted research environment. I've done some hobby-level stuff and tutorials, but overall I'm coming into this with a lot more experience with medical imaging than with computer vision or machine learning. After all that preamble here's my question: What does a decent CV/ML experiment look like? Left to my own direction I can see myself picking 3 different approaches of varying complexity, trying to get the best out of each of them, and then presenting a comparison of performance or accuracy of all of them. I then claim the "best one" as the one we move on with. There are a lot of tools out there for experiment tracking (eg neptune.ai), but I'm really not sure whether that sort of thing is over the top for what I need to do. Any tips or experience that you folks don't mind sharing? submitted by /u/PrivateFrank [link] [comments]  ( 10 min )
    [D] Limit the Number of Papers I Review on OpenReview?
    Hello, Does anyone know if it's possible to set a limit to the number of papers you are assigned as a reviewer on OpenReview? Specifically for ICLR 2024. I saw a Twitter thread about this option before for ICML. It blows my mind that this is not easy to change. I got 5 papers for the last NeurIps which was very overwhelming. As reviewers, we provide a free service to the community, and we should be allowed to pick how much work we want to undertake... submitted by /u/cringe_reddit_user69 [link] [comments]  ( 9 min )
    How can I change the orientation of a frame mockup using AI? [P]
    Hi all. I'm hoping someone out there can help me solve this. TLDR: How do I change the orientation of portrait frames to landscape frames while keeping the mockup essence the same. Link: https://ibb.co/album/hx6wp3 Basically, I have two portrait frame mockups that came in a bundle and the bundle had no landscape frame mockups at all. So, naturally I'd like to make my own since I have a lot of landscape artworks that could be displayed in the mockups. How can I change the orientation display of my mockup? I've tried using Photoshop's generative AI software and got nowhere. It keeps giving me a new frame design when I want to keep the original frame so it matches the set. Any leads on how this can be done would be appreciated. submitted by /u/Ambilina [link] [comments]  ( 9 min )
    [D] Product search using LLM
    Hey!One of my friends brought up an idea about using LLM for product search and we started talking about the idea and approach. Per my understanding what would need to be done is to train some smaller language model on the product data, create embeddings from the product info and make the model use this as a body of knowledge. My issue is that if this was ever to be done on commercial scale it seems very complex to me, since the embeddings would have to be re-created every time a new product is introduced? Let me know what you think or how you would approach this, as I'm trying to see different PoV's and everyone here has more experience than me. ​ Thanks! submitted by /u/LukaAda [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [R] New preprint on detecting errors in LLM prompt response
    We just released as study where we show that a "diversity measure" (e.g., entropy, Gini, etc.) can be used as a proxy for probability of failure in the response of an LLM prompt; we also show how this can be used to improve prompting as well as for prediction of errors. We found this to hold across three datasets and five temperature settings, tests conducted on ChatGPT. Preprint: https://arxiv.org/abs/2308.11189 Source code: https://github.com/lab-v2/diversity_measures Video: https://www.youtube.com/watch?v=BekDOLm6qBI&t=10s ​ Example result showing correlation of entropy with failure probaiblity submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    [D] Do papers like this "disprove" the stochastic parrot theory? Pretty strong evidence that LLMs can build an internal world model, at least for simple board games.
    https://arxiv.org/abs/2210.13382 submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [P] give me ideas on visualization.
    I have written AI model to predict NHL games and now working on visualization. No tech talk, just visual, assume I gather all possible data. I would like to make it a prediction dashboard and not sport dashboard so simple stats are not recommended. Data on the image is made up, don't bother. I am using matplotlib + seaborne (Python) submitted by /u/Fifa_ToNieMiami [link] [comments]  ( 9 min )
    [P] DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
    submitted by /u/ghosthamlet [link] [comments]  ( 9 min )
    [R] Challenges and Applications of Large Language Models - University College London 2023 - 72 Pages!
    Paper: https://arxiv.org/abs/2307.10169 Abstract: Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive. https://preview.redd.it/sng6uk7tcmkb1.jpg?width=657&format=pjpg&auto=webp&s=2ed693a88097cc8cbcd72ecd8c0d36820629625d https://preview.redd.it/wslkgm7tcmkb1.jpg?width=478&format=pjpg&auto=webp&s=9908f28717c8bd98d48d4559ccc2db9cc3796bee https://preview.redd.it/12q01l7tcmkb1.jpg?width=471&format=pjpg&auto=webp&s=1ca1eb54f679cf8a12f10aaf790d607db7bb363c ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [Project] UForm-v2: tiny CLIP-like embeddings in 21 languages with extreme performance
    Vision-Language understanding Transformer, which has 40% fewer parameters than vanilla CLIP while performing much better on text-to-image retrieval, where it's also beneficial that output embeddings have 2x fewer dimensions (256 vs 512). Moreover, it supports 21 languages, including popular English, Hindi, Chinese, Arabic, and lower-resource languages like Ukrainian, Hebrew, and Armenian. Demo: http://usearch-images.com/ Github: https://github.com/unum-cloud/uform https://i.redd.it/6133eyj73mkb1.gif submitted by /u/vov_or [link] [comments]  ( 9 min )
    [D] How is a language model applied on Speech-to-text models such as Wav2Vec 2.0 ?
    I'm new to speech processing. As I read the paper on wav2vec 2.0, I see them mentioning the use of language models in decoding, particularly a 4-gram model and a Transformer. As far as I'm aware, the encoder (wav2vec2) will output a probability sequence of L x V (where V is the vocab size, L is sequence length). I have two questions: I learned that a n-gram language model would predict the probability of a n-gram given previous context words, but how is a Transformer implemented here ? Does it follow a causal structure such as GPT and then estimate sequence likelihood ? How can a language model, trained to estimate next word (n-gram) probability given previous context, be used to decode the output sequence given the L x V probability outputs from above ? Many thanks ! submitted by /u/KarmaCut132 [link] [comments]  ( 9 min )
    [R] DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
    We introduce a new operator, called 3D DeFormable Attention (DFA3D), for 2D-to-3D feature lifting, which transforms multi-view 2D image features into a unified 3D space for 3D object detection. ​ Comparisons of feature lifting methods. Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-pass operation without feature refinement, or ignore depth and lift features by 2D attention mechanisms, which achieve finer semantics while suffering from a depth ambiguity problem. In contrast, our DFA3D-based method first leverages the estimated depth to expand each view's 2D feature map to 3D and then utilizes DFA3D to aggregate features from the expanded 3D feature maps. With the help of DFA3D, the depth ambiguity problem can be effectively alleviated from the root, and the lifted features can be progressively refined layer by layer, thanks to the Transformer-like architecture. In addition, we propose a mathematically equivalent implementation of DFA3D which can significantly improve its memory efficiency and computational speed. We integrate DFA3D into several methods that use 2D attention-based feature lifting with only a few modifications in code and evaluate on the nuScenes dataset. The experiment results show a consistent improvement of +1.41\% mAP on average, and up to +15.1\% mAP improvement when high-quality depth information is available, demonstrating the superiority, applicability, and huge potential of DFA3D. 🔥 Code: https://github.com/IDEA-Research/3D-deformable-attention.git 🔥 Paper: https://arxiv.org/abs/2307.12972 submitted by /u/HYeung_Lee [link] [comments]  ( 9 min )
    Shanghai AI Lab and NTU Unveil MATLABER: A Pioneer in Text-To-3D Creation [R]
    AI researchers from Shanghai AI Laboratory and Nanyang Technological University are breaking new ground with their creation of MATLABER, an innovative text-to-3D pipeline. If you want to stay ahead of the curve in AI and tech, look here first. https://preview.redd.it/8walduw7dkkb1.png?width=806&format=png&auto=webp&s=4908181a408d990ed224a503a63d78d204e460be Why this matters: Text-to-3D pipelines are a hot topic in AI Change: The ability to create 3D assets from textual descriptions can revolutionize the industry, reducing time, labor, and skill requirements. MATLABER conquers a longstanding issue: Overcoming the challenge of restoring high-fidelity object materials in text-to-3D pipelines, MATLABER expands the applicability of these technologies in real-world scenarios. Material-aw…  ( 9 min )
    [R] new diffusion model for music generation
    submitted by /u/jmoso13 [link] [comments]  ( 9 min )
  • Open

    Robotics and Artificial Intelligence: Pioneering a Longer, Healthier Life
    How large an impact do you think AI and robotics will have on healthcare, overall quality of life, and extending lifespans? The following article seeks to explore when we might possibly see AI & robotics fully integrated within society. https://www.catchingimmortality.com/technology-for-the-future/robotics-and-artificial-intelligence-pioneering-a-longer-healthier-life ​ submitted by /u/catchingimmortality [link] [comments]  ( 9 min )
    Will AI TV Shows Ever Be A Thing? (via prompt)
    Do you think there will ever be a time where, with a prompt, you could see entire TV Shows or an entire episode? ​ For example wanting to see what could of happened if alternate stuff happened in Dragon Ball Z, Or Breaking Bad if xyz. Of course there'd be a lot of uprising against it, but, do you think the time will ever come where this will be possible? submitted by /u/Different_Effective3 [link] [comments]  ( 9 min )
    Text to artful animation?
    I would like to be able to input phrases such as "artistic line drawings of birds flying through a blue sky spotted with clouds" or "colorful balloons moving around in slow motion like a 90's screen saver" or "time lapse of the moon moving across the starry night sky" etc. I want the AI to create minimalist, short (maybe 5 mins) animations from these sort of inputs. Can anyone point me in the right direction? submitted by /u/petworthy [link] [comments]  ( 9 min )
    How artificial intelligence sharpens blurry thermal Night Vision images
    submitted by /u/cranberryfix [link] [comments]  ( 9 min )
    AI and labor market/work life
    Hey peeps! I try to keep up with what's happening with the labor market and working life and how AI affects these areas. I am looking for good sources where you can stay up to date on this! What are some good podcasts, newsletters, books and the like that you should keep an eye on? submitted by /u/emillindstrom [link] [comments]  ( 9 min )
    Where can I find this AI voice?
    Hi all, I've heard this voice used alot recently, where can I find it/use it? Thanks submitted by /u/Fightingdaduk [link] [comments]  ( 9 min )
    How Does GPT-4 Work and How Do I Build Apps With It?
    Understanding GPT-4 What is GPT-4? GPT-4 (Generative Pre-trained Transformer 4) is a machine learning model for natural language understanding and generation. It works by analyzing a large dataset and generating text based on the input it receives. How Does It Work? GPT-4 uses deep neural networks with multiple layers to predict the next word in a sequence of words. The model has been trained on a wide range of internet text, so it's capable of understanding and generating coherent and contextually relevant text based on the prompts it's given. Building Apps with GPT-4 Step 1: Get API Access To use GPT-4, you'll first need access to its API. OpenAI provides this service, and you can apply for an API key from their website. Step 2: Choose Your Programming Language You can integrate the GPT-4 API into your application using various programming languages such as Python, JavaScript, or Ruby. Step 3: Making API Calls Once you've chosen your language, you'll make RESTful API calls to communicate with GPT-4. You'll pass your prompt as an input and receive generated text as output. Example in Python Here is a simple Python example using the openai library to interact with GPT-4: ```python import openai openai.api_key = "your-api-key-here" response = openai.Completion.create( engine="text-davinci-002", prompt="Translate the following English text to French: '{}'", max_tokens=60 ) print(response.choices[0].text.strip()) ``` Step 4: Handle Rate Limits OpenAI's API comes with rate limits, so you'll need to manage these by either queuing requests or handling retries. Step 5: Deployment After testing and fine-tuning, deploy your application. Ensure that you are abiding by OpenAI's usage policies and guidelines. Conclusion GPT-4 is a powerful tool for natural language understanding and generation. By understanding its workings and following the steps to integrate it into an application, you can leverage its capabilities for various use-cases. submitted by /u/nicdunz [link] [comments]  ( 10 min )
    10 Facts about Quantum Computing and AI That You Probably Didn’t Know
    Quantum computing can solve problems in seconds that would take classical computers millions of years. AI algorithms can be used to optimize quantum circuit design. Google's "quantum supremacy" claimed to perform a task in 200 seconds that would take classical supercomputers 10,000 years. Quantum Machine Learning algorithms could potentially revolutionize AI by enabling faster training and better optimization. Quantum error correction is a big challenge, as quantum bits (qubits) are highly susceptible to errors. AI can help in auto-correcting such errors in quantum computations. Quantum annealing, a specialized form of quantum computing, is being used for optimization problems in machine learning. Quantum computing's "quantum entanglement" can enable much more efficient parallel processing. AI-based quantum simulators can model complex quantum systems that are impossible to study otherwise. Quantum encryption, backed by the principles of quantum mechanics, can enhance AI security. submitted by /u/nicdunz [link] [comments]  ( 9 min )
  • Open

    Curvature at Cairo
    I was flipping through Gravitation [1] this weekend and was curious about an illustration on page 309. This post reproduces that graph. The graph is centered at Cairo, Egypt and includes triangles whose side lengths are the distances between cities. The triangles are calculated using only distances, not by measuring angles per se. The geometry […] Curvature at Cairo first appeared on John D. Cook.  ( 6 min )
    Calculating the intersection of two circles
    Given the equations for two circles, how can you tell whether they intersect? And if they do intersect, how do you find the point(s) of intersection? MathWorld gives a derivation, but I’d like to add the derivation there in two ways. First, I’d like to be more explicit about the number of solutions. Second, I’d […] Calculating the intersection of two circles first appeared on John D. Cook.  ( 6 min )
  • Open

    PMET: Precise Model Editing in a Transformer
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Diversity Measures: Domain-Independent Proxies for Failure in Language M...
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    Neural-Network transliteration of the Codex Seraphinianus
    submitted by /u/Marc_Op [link] [comments]  ( 9 min )
  • Open

    Python library for modular RL components
    After a year of struggling with RLlib I decided to start implementing the training code myself. I am looking for a RL library that offers me individual components rather than the whole algorithm. I do not need a PPO implementation, but I would fancy a library that offers me functions to compute the PPO loss given a batch of steps. In other words, what I need is a library that offers the most granular RL components (different losses, replay buffers, return estimators like GAE, etc) instead of full algorithm implementations. Which libraries do you recommend for this purpose? submitted by /u/fedetask [link] [comments]  ( 9 min )
    Choosing best RL library for MuJoCo with envpool
    TL;DR What RL library use in combination with MuJoCo and envpool Hi I want to write program that would find best hyperparameters (number of joint, angles) for design of robots (similar to NAS). It would work in such a way that I would have one RL algorithm that would search for the hyperparameters of the robot and then I would to train and evaluate this robot using SAC in MuJoCo physical simulator. Problem is that MuJoCo runs on CPU and I need lots of parallel enviroments and for this I would use envpool https://github.com/sail-sg/envpool. The question is what (if any) RL library should I use as a wrapper. The options are Stable-Baselines3, Tianshou, ACME, CleanRL, or rl_games. picture of one robot design https://imgur.com/a/5UDdEsE Other than that, do you have any recommendations or notes regarding my project idea? Thanks for response submitted by /u/EFK1500 [link] [comments]  ( 9 min )
    Action selection in Multiple action for continuous state spaces in DDPG
    I have a confusion in action selection in actor of DDPG algorithm. The actor receive state as input and output as deterministic action (generally from tanh function). In the multiple continuous action environment, does the actor perform multiple action simultaneously from the clipped output Tanh [-1,1]? or it has some posterior function converting from Tanh vectors to single deterministic action like Softmax? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
    Mathematics of Best of n Sampling
    Best of n Sampling is a surprisingly simple technique to steer an LM to human preferences much in the same way as Reinforcement Learning algorithms such as RLHF do. Here is the blogpost [0] describing Best of n. [0] https://preview.redd.it/cpi5tj3injkb1.png?width=1670&format=png&auto=webp&s=33eb3f301b515926fd5820ea3c60acd0e1c5ddb1 The blog post claims that one neat property of Best-of-n sampling is that the KL divergence with the initial policy can be computed analytically in closed form. ​ https://preview.redd.it/eij0igaxnjkb1.png?width=1724&format=png&auto=webp&s=73040d49ae55ac651c5fe62b0f4a06b7f8bfd2c5 This turns out to be https://preview.redd.it/gbn7ch10ojkb1.png?width=270&format=png&auto=webp&s=849fe773b31b795c9253fa4cd8172c3120aec745 The blog post provides a hint to express the pdf of BoN in terms of PDF and CDF of the original distribution, but I cannot see how I can express the PDF of BoN in terms of PDF and CDF of the original distribution. Can anyone help me with this? [0] https://openai.com/research/measuring-goodharts-law submitted by /u/ElendirThreadripper [link] [comments]  ( 9 min )

  • Open

    Getting random latents in W+ space [D]
    I'm trying to get roll, pitch, yaw directions in W+ space. Initially, I need like 10k generated images, which I'll get top %5 and bottom %5 for the features I want. I tried to sample from uniform distribution but it fails since W+ is not uniformly distributed. How do I achieve this? submitted by /u/cltexe [link] [comments]  ( 9 min )
    [D] An AI's response to: "Consciousness in Artificial Intelligence: Insights from the Science of Consciousness."
    submitted by /u/ronin_zz123 [link] [comments]  ( 16 min )
    [Discussion] LLMs in business
    Every business on the planet will want to train and feed its own LLM asap to not fall behind. \super computer = tech needed to train a LLM fast on unlimited data* (1) Does a company like McKinsey (confidential data) train their LLM in-house or in the cloud? (2) Do enough super computers exist for every company to start training their LLM today? (3) Is there even a single company that ships super computers capable of training LLMs in-house? (4) McKinsey will want to train their LLMs on all data they have from their customers so that McKinsey can work at max efficiency. Customers won't like that. Is it possible to un-train specific data sets? (5) Would it be possible to feed the LLM with the customer's data instead of training the LLM on the data? What would be the differences? If you feed it the data, then the LLM can't work with the data as well as it could if you trained it on said data? The future is just so damn exciting and I have all these questions popping up so I hope some educated folks can share some insights! Thanks for reading! submitted by /u/MopPanda [link] [comments]  ( 9 min )
    [D] Was trying out Llama2 13B MegaCode2 OASST on my local pc
    https://im3.ezgif.com/tmp/ezgif-3-b05ffc9d5f.gif submitted by /u/theswiftdeveloper [link] [comments]  ( 9 min )
    [D] Comparing Score-Based and Diffusion Models in Theory and Practice
    In theory, it has been demonstrated that score matching models and diffusion models share mathematical similarities. However, in practice, the equivalence between the two approaches may not extend to code implementations. While PyTorch implementations for diffusion models are relatively common, finding equivalent implementations for score-based models can be more challenging. submitted by /u/whysomeonetookmyname [link] [comments]  ( 9 min )
    [D] Industry design patterns for fast-moving ML/DL
    I have been writing ML code (both training and off-the-shelf model inferences) for close to six years now, but mostly in an academic/personal project setting. Now, I find myself spearheading an ML project at a big company, and our backend code base keeps growing, and other people depend on it. There are layers to it, with threads spawning, and dependencies on caches and databases for state sharing. It's more than a pet project - you get the gist. I want to design production-ready architectures that are more robust than piecemeal/make-shift solutions. Do people have resources or suggestions on what established design patterns work in the industry? I have found it hard to find resources just by googling because the pace at which ML research works makes most books/tutorials outdated. Take retrieval augmented generation, for example. Do you store your documents in an elasticsearch store and build indices periodically or do you store them in FAISS? How separated is your retrieval module from your LLM call? Do you host in-house LLM's centrally company-wide or per-project? What has worked for you so far in the industry? submitted by /u/whyusenosqlreddit [link] [comments]  ( 9 min )
    [D] How do you normalize a large taxonomy with lot of similar words.
    I have a large taxonomy of work titles I scraped from linkedin and other career sites. Now I ahve like 90k titles. To reduce them or group them into a sort of 5k unique titles I tried k means clustering but didn't work out good. How do I proceed with this task? Any pointers would be appreciated. submitted by /u/wet_cosplay [link] [comments]  ( 9 min )
    [R] NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes
    submitted by /u/KaleidoscopeBest1569 [link] [comments]  ( 9 min )
    [Research] Scholars Program
    Hi everyone. We recently announced the Cohere For AI scholars program, a 8 month full-time paid industry research role to join our team and work on fundamental machine learning at scale. The goal is to support rising stars in ML pursue curiosity driven research w access to large scale engineering resources and mentorship. We have intentionally structured the program to be paid and remote-first so we can support talent all across the world. You will have access to a top tier research team and you can find some of our prior publications here. Our deadline is coming up on September11th. Wanted to make sure this was visible to researchers around the world, and thought many in this forum would be interested. More details below for anyone interested: The Cohere For AI Scholars Program supports the next generation of rising ML stars as they embark on their research journey by providing an alternative point of entry into NLP research. Scholars will have access to a large-scale experimental framework and work alongside some of the best researchers and engineering expertise in the world. Participation is full-time, remote-first and paid. For more details, check out our blog post announcing the Scholars Program launch. Applications are open until September 11, 2023. For those undertaking application, would highly recommend joining our open science discord where we have a highly active FAQ channel for any questions about the program. You can find out more about how to join at cohere.for.ai. Looking forward to reading your applications! submitted by /u/ml_magic_ [link] [comments]  ( 9 min )
    [D] How does a ML model differentiate between Nominal and Ordinal?
    Suppose I have data about cars. In it there are multiple columns like 'Type' which contains "Sedan", "Hatchback", "Convertible" and "Minivan". Then there are 'Color' like "Red", "White", "Blue", etc. And I have used ordinal encoding for 'Types' columns and label for 'Color' column. How will the model know that Types is ordinal while Color is nominal. PS. Suppose I cannot use One Hot encoding as it will increase the no of columns by 20 or 30. submitted by /u/Luffykent [link] [comments]  ( 9 min )
    UK Startup Etcembly Unveils AI-Designed Cancer Immunotherapy [N]
    Etcembly, a UK-based biotech startup, has disclosed one of the first generative AI-designed immunotherapy candidates, known to target a protein present in many cancers. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/n6nwoeloohkb1.jpg?width=1200&format=pjpg&auto=webp&s=defa11280ea75f1e6e26ff9014b7e673b0a181ea Key highlights: Etcembly's AI-designed immunotherapy is innovative: The startup used generative AI to design novel cancer immunotherapy in record time. The therapeutic 'ETC-101' was created and optimized in just 11 months, compared to the traditional two years typically needed. The value of AI makes itself evident: Etcembly's AI engine, EMLy, uses LLMs to predict, design, and validate candidate TCRs, scanning…  ( 10 min )
    Understanding the Constraint of Weight Sums in Loss Functions for Noisy Label Learning [Discussion], [Question]
    Working on a machine-learning task with a dataset full of noisy labels. Thinking of using reweighted loss to tackle the label noise issue. I get that it helps give more importance to clean samples during training. But, about the sum of these weights used in the loss function - should they always add up to 1? What's the reasoning behind this constraint? Can't the weights sum up to any positive value instead? Also, if I intend to assign loss values with probabilities, does the weighted sum still need to be 1? Need help clarifying if my understanding is correct! submitted by /u/Positive_External_27 [link] [comments]  ( 9 min )
    [R] To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    [D] RL[HF] on diffusion models & vision models
    Recently came across: https://datasciencecastnet.home.blog/2023/04/06/a-recipe-for-training-good-generative-models/ and this paper: https://arxiv.org/pdf/2302.08242.pdf The first article is very interesting as it suggests incorporating RLHF in the stack of building a strong diffusion model, the second article demonstrates that it is possible to create stronger computer vision systems with further fine-tuning on metrics (reward functions) that are not differentiable (!), such as mAP for object detection, which I personally found super interesting. These observations makes me think the "general" recipe to build a very good AI model (not only restricted to LLMs) is pretty aligned with what has been done with ChatGPT : 1- supervise fine-tune on a target domain / 2- design & build a reward model / 3- Further align the generations & output with RL Just curious if anyone has any experience with RL + diffusion & vision models? Why do you think this is not super popular yet? submitted by /u/mzitoune [link] [comments]  ( 9 min )
    [D] RL(HF) + diffusion models & vision models
    Recently came across: https://datasciencecastnet.home.blog/2023/04/06/a-recipe-for-training-good-generative-models/ and this paper: https://arxiv.org/pdf/2302.08242.pdf The first article is very interesting as it suggests incorporating RLHF in the stack of building a strong diffusion model, the second article demonstrates that it is possible to create stronger computer vision systems with further fine-tuning on metrics (reward functions) that are not differentiable (!), such as mAP for object detection, which I personally found super interesting. These observations makes me think the "general" recipe to build a very good AI model (any modality) is pretty aligned with what has been done with ChatGPT : 1- supervise fine-tune on a target domain / 2- design / build a reward model / 3- Align the generations / output with RL Just curious if anyone has any experience with RL + diffusion & vision models? Why do you think this is not super popular yet? ​ [link] [comments]  ( 9 min )
    [N] Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B
    Blog: https://www.phind.com/blog/code-llama-beats-gpt4 Models: https://huggingface.co/Phind/Phind-CodeLlama-34B-Python-v1 https://huggingface.co/Phind/Phind-CodeLlama-34B-v1 submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] Llama 2, CodeLlama, and GPT-4 performance: A write-up on the LLM developments and research.
    submitted by /u/seraschka [link] [comments]  ( 9 min )
    [D] Recursive Least Squares vs Gradient Descent for Neural Networks
    I have been captivated by Recursive Least Squares (RLS) methods, particularly the approach that employs error prediction instead of matrix inversion. This method is quite intuitive. Let's consider a scenario where you need to estimate the true effect of four factors (color, gender, age, and weight) on blood sugar. To find the true impact of weight on blood sugar, it's necessary to eliminate the influence of every other factor on weight. This can be accomplished by using simple least squares regression to predict the residual errors recursively, as shown in the diagram below: Removing the effect of all factors on \"weight\" in a recursive manner The fundamental contrast between RLS and Gradient-based methods lies in how errors are distributed across inputs based on their activity, leading to the subsequent update of weights. However, in the case of RLS, all inputs undergo decorrelation before evaluating prediction errors. Comparison between error sharing in RLS and GD This de-correlation can be done in few lines of python code: for i in range(number_of_factors): for j in range(i+1, number_of_factors): wx = np.sum(x[i] * x[j]) / np.sum(x[i]**2) x[j] -= wx * x[i] This approach also bears relevance to predictive coding and can shed light on intriguing neuroscientific findings, such as the increase brain activity during surprising or novel events — attributable to prediction errors. The prediction errors are increasing during the surprising events similar to how brain activity increases. RLS learns very fast but it's still subpar to deep learning when it comes to non-linear hierarchical structures but that is probably because Gradient based methods enjoyed more attention and tinkering from the ML-community. I think RLS methods needs more attention and I have been working on some research projects that uses this method for signal prediction . If you're interested, you can find the source code here: https://github.com/hunar4321/RLS-neural-net ​ submitted by /u/brainxyz [link] [comments]  ( 10 min )
    [P] LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Multi-Query Attention
    submitted by /u/hkproj_ [link] [comments]  ( 9 min )
    [Discussion] What Model to Choose for a NN with a Very Wide Output Layer?
    The input of my neural network consists of 20 features, whereas the output consists of 20,000 of them (predicting a "quantum classical shadow" based on a few parameters: the rotation angle as the input and a few hundreds of shots of random measurements as the output). AFAIK, it's a linear regression problem. What I've tried: - an FCNN (doesn't work good); - Scikit-Learn Lasso (the same results); - MSE regression using Neural Tangents (the same). Any ideas on how to solve this? Thanks a lot in advance! submitted by /u/avpol111 [link] [comments]  ( 9 min )
    [D] What's the best model for iterative ranking determination from pairwise comparisons?
    There are many entities: A, B, C, D... ( B; C > A; D > C; ... A comparison is expensive. Objective: to approximate the absolute order of entities (best entities at the top of the list, worst at the bottom), minimize the number of comparisons The worst solution would be just applying a sorting algorithm, which would require n log n comparisons. I believe an active sampling technique would be required, i.e. select a number of entities with the highest uncertainty, and do comparisons with them, adjust the model, repeat. ChatGPT suggests a Bradley-Terry model and even gives an implementation example. I wonder if there is anything better? submitted by /u/gintrux [link] [comments]  ( 9 min )
    [R] Simpler decision tree implementation question?
    I am trying to implement a decision tree in a very computationally dumb software which can only execute if else statements. If the decision tree is trained some place else and then shared to this software could I deploy the model as a bunch of if else statements. If so how would I know the exact comparison order which would be needed for the if else statements and since this would require to know every detail of the decision tree would I have to make the whole algorithm from scratch so I can access every nook of the decision tree or is there a library which let me access every weight and know what's the weight of each branch? Sorry if it's a dumb question. submitted by /u/ghostfreak999 [link] [comments]  ( 9 min )
    Apple researchers propose a novel method for creating detailed 3D models from images [R]
    Traditional methods of creating 3D models from images often rely on estimating the depth of each pixel in the image, which can result in errors or missing details in areas that are transparent or have low texture. A team of researchers from Apple and UCSB have proposed a new method that directly infers the 3D geometry of a scene using deep neural networks, without requiring any test-time optimization. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/pqxjeafi0dkb1.png?width=748&format=png&auto=webp&s=8daefa852a8805b48cc8586a4a8ec94e5e49123c Why this matters: 3D reconstruction is a fundamental problem in computer vision and graphics: it has many applications in entertainment, education, medicine, and engineering. Howe…  ( 10 min )
  • Open

    'Generative Inbreeding' and its Risk to Human Culture
    submitted by /u/cranberryfix [link] [comments]  ( 9 min )
    OpenAI Just Bought a Game Studio Working on a "Minecraft" Clone
    submitted by /u/cranberryfix [link] [comments]  ( 9 min )
    Best AI companies for you to invest in 2023 (Tabular Comparison included)
    AI is advancing at exponential rate. Its growth is limitless. I have compiled a list of best AI company which are hot stocks right now to invest in 2023. Take a look at them carefully. Meta Platforms Co., Ltd. (META) Meta’s user engagement by 7% in the second quarter. Bank of America has a Buy rating on META stock and a price target of $375 (it closed at $316.56 on Aug. 7). Alphabet Inc. (GOOG, GOOGL) Bank of America has a Buy rating on GOOGL stock and a price target of $146 (it closed at $131.53 on Aug. 7). NVIDIA Corporation (NVDA) Check out Full list ​ submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/25/2023
    Google DeepMind's new chess engine beats its famous AlphaZero.[1] OpenAI partners with Scale AI to allow companies to fine-tune GPT-3.5.[2] AMD has acquired Mipsology, an AI software company focused on computer interpretations and responses to photos and videos.[3] Former Meta researchers who developed an AI language model for biology have launched a new startup and raised at least $40 million, Forbes has learned.[4] Sources: [1] https://the-decoder.com/google-deepminds-new-chess-engine-beats-its-famous-alphazero/ [2] https://techcrunch.com/2023/08/24/openai-partners-with-scale-ai-to-allow-companies-to-fine-tune-gpt-3-5/ [3] https://www.investopedia.com/amd-acquires-french-ai-software-company-mipsology-7852209 [4] https://www.forbes.com/sites/kenrickcai/2023/08/25/evolutionaryscale-ai-biotech-startup-meta-researchers-funding/?sh=7982a406140c submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    This is so impressive...Freddie Mercury AI as Mickael Jackson - Thriller
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
  • Open

    Upside Down Reinforcement Learning Implementation
    I recently implemented UDRL and just published it. If anyone finds it useful, feel free to check it out: https://github.com/mphe/upside-down-reinforcement-learning. There are many other implementations out there, but most of them are difficult to extend and maintain, due to being written in a sloppy manner, or are incorrect, e.g. not using multiplicative interactions or contain smaller bugs and issues. This project aims to fix these issues, while potentially improving performance, providing a proper OOP interface, and reusing code from Stable Baselines 3 where applicable. Furthermore, the algorithm has been extended to support additional features, like multi-threading, which speeds up the training time immensely. It also provides an interface similar to SB algorithms, so it can be used mostly analogously. For more information, see the Github page. Contributions are welcome! submitted by /u/mphe_ [link] [comments]  ( 9 min )
    Multi-Agent RL where agents' actions are dependent on nearby agent's actions
    I am working to design an multi-agent reinforcement learning agent, where the agents that are spatially close are connected and the information is shared, which will be done through a convolution process. However, when convoluting the nearby agents' observations, I also wish nearby agents' action values to be part of the local observation that will be convoluted, however this would cause a dilemma as for an agent to choose and action, it will have to know other agents' actions but the other agents would have to know this agent's value for deciding the value. Are there MARL methods that can help me fix this problem? submitted by /u/LeSUTHU [link] [comments]  ( 9 min )
    Advice on understanding intuition behind RL algorithms.
    I am trying to understand Policy Iteration from the book "Reinforcement learning an introduction". I understood the pseudo code and applied it using python. But still I feel like I don't have a intuitive understanding of Policy Iteration. Like why it works? I know how it works. Any advice on how to get an intuitive understanding of RL algorithms? I reread the policy iteration multiple times, but still feel like I don't understand it. submitted by /u/mono1110 [link] [comments]  ( 9 min )
  • Open

    A small programming language
    Paul Graham said “Programming languages teach you not to want what they don’t provide.” He meant that as a negative: programmers using less expressive languages don’t know what they’re missing. But you could also take that as a positive: using a simple language can teach you that you don’t need features you thought you needed. […] A small programming language first appeared on John D. Cook.  ( 6 min )
    Quadrature rules and an impossibility theorem
    Many numerical integration formulas over a finite interval have the form That is, the integral on the left can be approximated by evaluating the integrand f at particular nodes and taking the weighted sum, and the error is some multiple of a derivative of f evaluated at a point in the interval [a, b]. This […] Quadrature rules and an impossibility theorem first appeared on John D. Cook.  ( 6 min )
  • Open

    Twitter Community / Machine Learning
    submitted by /u/x9182 [link] [comments]  ( 9 min )
    Zoomposium with Professor Dr. John-Dylan Haynes: "In search of the code of the brain"
    Zoomposium with Professor Dr. John-Dylan Haynes: "In search of the code of the brain" In this new episode of our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker from the "Blog der großen Fragen" and I have managed to win the well-known and renowned brain researcher and psychologist Professor Dr. John-Dylan Haynes for an interview. John-Dylan Haynes has been a professor of theory and analysis of long-range brain signals at the Bernstein Center for Computational Neuroscience and the Berlin Center for Advanced Neuroimaging (BCAN) at Charité and Humboldt University in Berlin since 2006. There, Professor Haynes and his team are "In Search of the Brain's Code". In order to crack this, larger amounts of data are collected from the functional magnetic resonance i…  ( 10 min )
    Deep Neural Nets: 33 years ago and 33 years from now
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    Probabilistic load forecasting with Reservoir Computing. (arXiv:2308.12844v1 [cs.LG])
    Some applications of deep learning require not only to provide accurate results but also to quantify the amount of confidence in their prediction. The management of an electric power grid is one of these cases: to avoid risky scenarios, decision-makers need both precise and reliable forecasts of, for example, power loads. For this reason, point forecasts are not enough hence it is necessary to adopt methods that provide an uncertainty quantification. This work focuses on reservoir computing as the core time series forecasting method, due to its computational efficiency and effectiveness in predicting time series. While the RC literature mostly focused on point forecasting, this work explores the compatibility of some popular uncertainty quantification methods with the reservoir setting. Both Bayesian and deterministic approaches to uncertainty assessment are evaluated and compared in terms of their prediction accuracy, computational resource efficiency and reliability of the estimated uncertainty, based on a set of carefully chosen performance metrics.
    DLIP: Distilling Language-Image Pre-training. (arXiv:2308.12956v1 [cs.CV])
    Vision-Language Pre-training (VLP) shows remarkable progress with the assistance of extremely heavy parameters, which challenges deployment in real applications. Knowledge distillation is well recognized as the essential procedure in model compression. However, existing knowledge distillation techniques lack an in-depth investigation and analysis of VLP, and practical guidelines for VLP-oriented distillation are still not yet explored. In this paper, we present DLIP, a simple yet efficient Distilling Language-Image Pre-training framework, through which we investigate how to distill a light VLP model. Specifically, we dissect the model distillation from multiple dimensions, such as the architecture characteristics of different modules and the information transfer of different modalities. We conduct comprehensive experiments and provide insights on distilling a light but performant VLP model. Experimental results reveal that DLIP can achieve a state-of-the-art accuracy/efficiency trade-off across diverse cross-modal tasks, e.g., image-text retrieval, image captioning and visual question answering. For example, DLIP compresses BLIP by 1.9x, from 213M to 108M parameters, while achieving comparable or better performance. Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22.4% parameters and 24.8% FLOPs compared to the teacher model and accelerates inference speed by 2.7x.
    Equal Treatment: Measuring Fairness using Explanation Distributions. (arXiv:2303.08040v2 [cs.LG] UPDATED)
    Liberalism-oriented political philosophy reasons that all individuals should be treated equally independently of their protected characteristics. Related work in machine learning has translated the concept of equal treatment into terms of equal outcome and measured it as demographic parity (also called statistical parity). Our analysis reveals that the two concepts of equal outcome and equal treatment diverge; therefore, demographic parity does not faithfully represent the notion of equal treatment. We propose a new formalization for equal treatment by (i) considering the influence of feature values on predictions, such as computed by Shapley values explaining classifications, (ii) defining distributions of explanations, and (iii) comparing explanation distributions between populations with different protected characteristics. We show the theoretical properties of our notion of equal treatment and devise a classifier two-sample test based on the AUC of an equal treatment inspector. We study our formalization of equal treatment on synthetic and natural data. We release explanationspace, an open-source Python package with methods and tutorials.
    Diagnosing Infeasible Optimization Problems Using Large Language Models. (arXiv:2308.12923v1 [cs.HC])
    Decision-making problems can be represented as mathematical optimization models, finding wide applications in fields such as economics, engineering and manufacturing, transportation, and health care. Optimization models are mathematical abstractions of the problem of making the best decision while satisfying a set of requirements or constraints. One of the primary barriers to deploying these models in practice is the challenge of helping practitioners understand and interpret such models, particularly when they are infeasible, meaning no decision satisfies all the constraints. Existing methods for diagnosing infeasible optimization models often rely on expert systems, necessitating significant background knowledge in optimization. In this paper, we introduce OptiChat, a first-of-its-kind natural language-based system equipped with a chatbot GUI for engaging in interactive conversations about infeasible optimization models. OptiChat can provide natural language descriptions of the optimization model itself, identify potential sources of infeasibility, and offer suggestions to make the model feasible. The implementation of OptiChat is built on GPT-4, which interfaces with an optimization solver to identify the minimal subset of constraints that render the entire optimization problem infeasible, also known as the Irreducible Infeasible Subset (IIS). We utilize few-shot learning, expert chain-of-thought, key-retrieve, and sentiment prompts to enhance OptiChat's reliability. Our experiments demonstrate that OptiChat assists both expert and non-expert users in improving their understanding of the optimization models, enabling them to quickly identify the sources of infeasibility.
    An Accelerated Block Proximal Framework with Adaptive Momentum for Nonconvex and Nonsmooth Optimization. (arXiv:2308.12126v2 [math.OC] UPDATED)
    We propose an accelerated block proximal linear framework with adaptive momentum (ABPL$^+$) for nonconvex and nonsmooth optimization. We analyze the potential causes of the extrapolation step failing in some algorithms, and resolve this issue by enhancing the comparison process that evaluates the trade-off between the proximal gradient step and the linear extrapolation step in our algorithm. Furthermore, we extends our algorithm to any scenario involving updating block variables with positive integers, allowing each cycle to randomly shuffle the update order of the variable blocks. Additionally, under mild assumptions, we prove that ABPL$^+$ can monotonically decrease the function value without strictly restricting the extrapolation parameters and step size, demonstrates the viability and effectiveness of updating these blocks in a random order, and we also more obviously and intuitively demonstrate that the derivative set of the sequence generated by our algorithm is a critical point set. Moreover, we demonstrate the global convergence as well as the linear and sublinear convergence rates of our algorithm by utilizing the Kurdyka-Lojasiewicz (K{\L}) condition. To enhance the effectiveness and flexibility of our algorithm, we also expand the study to the imprecise version of our algorithm and construct an adaptive extrapolation parameter strategy, which improving its overall performance. We apply our algorithm to multiple non-negative matrix factorization with the $\ell_0$ norm, nonnegative tensor decomposition with the $\ell_0$ norm, and perform extensive numerical experiments to validate its effectiveness and efficiency.
    Unsupervised Manifold Linearizing and Clustering. (arXiv:2301.01805v2 [cs.LG] UPDATED)
    We consider the problem of simultaneously clustering and learning a linear representation of data lying close to a union of low-dimensional manifolds, a fundamental task in machine learning and computer vision. When the manifolds are assumed to be linear subspaces, this reduces to the classical problem of subspace clustering, which has been studied extensively over the past two decades. Unfortunately, many real-world datasets such as natural images can not be well approximated by linear subspaces. On the other hand, numerous works have attempted to learn an appropriate transformation of the data, such that data is mapped from a union of general non-linear manifolds to a union of linear subspaces (with points from the same manifold being mapped to the same subspace). However, many existing works have limitations such as assuming knowledge of the membership of samples to clusters, requiring high sampling density, or being shown theoretically to learn trivial representations. In this paper, we propose to optimize the Maximal Coding Rate Reduction metric with respect to both the data representation and a novel doubly stochastic cluster membership, inspired by state-of-the-art subspace clustering results. We give a parameterization of such a representation and membership, allowing efficient mini-batching and one-shot initialization. Experiments on CIFAR-10, -20, -100, and TinyImageNet-200 datasets show that the proposed method is much more accurate and scalable than state-of-the-art deep clustering methods, and further learns a latent linear representation of the data.
    Wasserstein Geodesic Generator for Conditional Distributions. (arXiv:2308.10145v2 [stat.ML] UPDATED)
    Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributions. Based on this result, we propose a novel conditional generation algorithm where conditional distributions are fully characterized by a metric space defined by a statistical distance. We employ optimal transport theory to propose the Wasserstein geodesic generator, a new conditional generator that learns the Wasserstein geodesic. The proposed method learns both conditional distributions for observed domains and optimal transport maps between them. The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels. Experiments on face images with light conditions as domain labels demonstrate the efficacy of the proposed method.
    FlexFringe: Modeling Software Behavior by Learning Probabilistic Automata. (arXiv:2203.16331v3 [cs.LG] UPDATED)
    We present the efficient implementations of probabilistic deterministic finite automaton learning methods available in FlexFringe. These implement well-known strategies for state-merging including several modifications to improve their performance in practice. We show experimentally that these algorithms obtain competitive results and significant improvements over a default implementation. We also demonstrate how to use FlexFringe to learn interpretable models from software logs and use these for anomaly detection. Although less interpretable, we show that learning smaller more convoluted models improves the performance of FlexFringe on anomaly detection, outperforming an existing solution based on neural nets.
    Masked Feature Modelling: Feature Masking for the Unsupervised Pre-training of a Graph Attention Network Block for Bottom-up Video Event Recognition. (arXiv:2308.12673v1 [cs.CV])
    In this paper, we introduce Masked Feature Modelling (MFM), a novel approach for the unsupervised pre-training of a Graph Attention Network (GAT) block. MFM utilizes a pretrained Visual Tokenizer to reconstruct masked features of objects within a video, leveraging the MiniKinetics dataset. We then incorporate the pre-trained GAT block into a state-of-the-art bottom-up supervised video-event recognition architecture, ViGAT, to improve the model's starting point and overall accuracy. Experimental evaluations on the YLI-MED dataset demonstrate the effectiveness of MFM in improving event recognition performance.
    LR-XFL: Logical Reasoning-based Explainable Federated Learning. (arXiv:2308.12681v1 [cs.AI])
    Federated learning (FL) is an emerging approach for training machine learning models collaboratively while preserving data privacy. The need for privacy protection makes it difficult for FL models to achieve global transparency and explainability. To address this limitation, we incorporate logic-based explanations into FL by proposing the Logical Reasoning-based eXplainable Federated Learning (LR-XFL) approach. Under LR-XFL, FL clients create local logic rules based on their local data and send them, along with model updates, to the FL server. The FL server connects the local logic rules through a proper logical connector that is derived based on properties of client data, without requiring access to the raw data. In addition, the server also aggregates the local model updates with weight values determined by the quality of the clients' local data as reflected by their uploaded logic rules. The results show that LR-XFL outperforms the most relevant baseline by 1.19%, 5.81% and 5.41% in terms of classification accuracy, rule accuracy and rule fidelity, respectively. The explicit rule evaluation and expression under LR-XFL enable human experts to validate and correct the rules on the server side, hence improving the global FL model's robustness to errors. It has the potential to enhance the transparency of FL models for areas like healthcare and finance where both data privacy and explainability are important.
    Towards Automated Animal Density Estimation with Acoustic Spatial Capture-Recapture. (arXiv:2308.12859v1 [cs.SD])
    Passive acoustic monitoring can be an effective way of monitoring wildlife populations that are acoustically active but difficult to survey visually. Digital recorders allow surveyors to gather large volumes of data at low cost, but identifying target species vocalisations in these data is non-trivial. Machine learning (ML) methods are often used to do the identification. They can process large volumes of data quickly, but they do not detect all vocalisations and they do generate some false positives (vocalisations that are not from the target species). Existing wildlife abundance survey methods have been designed specifically to deal with the first of these mistakes, but current methods of dealing with false positives are not well-developed. They do not take account of features of individual vocalisations, some of which are more likely to be false positives than others. We propose three methods for acoustic spatial capture-recapture inference that integrate individual-level measures of confidence from ML vocalisation identification into the likelihood and hence integrate ML uncertainty into inference. The methods include a mixture model in which species identity is a latent variable. We test the methods by simulation and find that in a scenario based on acoustic data from Hainan gibbons, in which ignoring false positives results in 17% positive bias, our methods give negligible bias and coverage probabilities that are close to the nominal 95% level.
    Leveraging Global Binary Masks for Structure Segmentation in Medical Images. (arXiv:2205.09107v2 [eess.IV] UPDATED)
    Deep learning (DL) models for medical image segmentation are highly influenced by intensity variations of input images and lack generalization due to primarily utilizing pixels' intensity information for inference. Acquiring sufficient training data is another challenge limiting models' applications. We proposed to leverage the consistency of organs' anatomical shape and position information in medical images. We introduced a framework leveraging recurring anatomical patterns through global binary masks for organ segmentation. Two scenarios were studied.1) Global binary masks were the only model's (i.e. U-Net) input, forcing exclusively encoding organs' position and shape information for segmentation/localization.2) Global binary masks were incorporated as an additional channel functioning as position/shape clues to mitigate training data scarcity. Two datasets of the brain and heart CT images with their ground-truth were split into (26:10:10) and (12:3:5) for training, validation, and test respectively. Training exclusively on global binary masks led to Dice scores of 0.77(0.06) and 0.85(0.04), with the average Euclidian distance of 3.12(1.43)mm and 2.5(0.93)mm relative to the center of mass of the ground truth for the brain and heart structures respectively. The outcomes indicate that a surprising degree of position and shape information is encoded through global binary masks. Incorporating global binary masks led to significantly higher accuracy relative to the model trained on only CT images in small subsets of training data; the performance improved by 4.3-125.3% and 1.3-48.1% for 1-8 training cases of the brain and heart datasets respectively. The findings imply the advantages of utilizing global binary masks for building generalizable models and to compensate for training data scarcity.
    Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion. (arXiv:2308.12734v1 [cs.SD])
    There are growing implications surrounding generative AI in the speech domain that enable voice cloning and real-time voice conversion from one individual to another. This technology poses a significant ethical threat and could lead to breaches of privacy and misrepresentation, thus there is an urgent need for real-time detection of AI-generated speech for DeepFake Voice Conversion. To address the above emerging issues, the DEEP-VOICE dataset is generated in this study, comprised of real human speech from eight well-known figures and their speech converted to one another using Retrieval-based Voice Conversion. Presenting as a binary classification problem of whether the speech is real or AI-generated, statistical analysis of temporal audio features through t-testing reveals that there are significantly different distributions. Hyperparameter optimisation is implemented for machine learning models to identify the source of speech. Following the training of 208 individual machine learning models over 10-fold cross validation, it is found that the Extreme Gradient Boosting model can achieve an average classification accuracy of 99.3% and can classify speech in real-time, at around 0.004 milliseconds given one second of speech. All data generated for this study is released publicly for future research on AI speech detection.
    Individual Privacy Accounting with Gaussian Differential Privacy. (arXiv:2209.15596v2 [cs.CR] UPDATED)
    Individual privacy accounting enables bounding differential privacy (DP) loss individually for each participant involved in the analysis. This can be informative as often the individual privacy losses are considerably smaller than those indicated by the DP bounds that are based on considering worst-case bounds at each data access. In order to account for the individual privacy losses in a principled manner, we need a privacy accountant for adaptive compositions of randomised mechanisms, where the loss incurred at a given data access is allowed to be smaller than the worst-case loss. This kind of analysis has been carried out for the R\'enyi differential privacy (RDP) by Feldman and Zrnic (2021), however not yet for the so-called optimal privacy accountants. We make first steps in this direction by providing a careful analysis using the Gaussian differential privacy which gives optimal bounds for the Gaussian mechanism, one of the most versatile DP mechanisms. This approach is based on determining a certain supermartingale for the hockey-stick divergence and on extending the R\'enyi divergence-based fully adaptive composition results by Feldman and Zrnic. We also consider measuring the individual $(\varepsilon,\delta)$-privacy losses using the so-called privacy loss distributions. With the help of the Blackwell theorem, we can then make use of the RDP analysis to construct an approximative individual $(\varepsilon,\delta)$-accountant.
    Constrained Stein Variational Trajectory Optimization. (arXiv:2308.12110v1 [cs.RO] CROSS LISTED)
    We present Constrained Stein Variational Trajectory Optimization (CSVTO), an algorithm for performing trajectory optimization with constraints on a set of trajectories in parallel. We frame constrained trajectory optimization as a novel form of constrained functional minimization over trajectory distributions, which avoids treating the constraints as a penalty in the objective and allows us to generate diverse sets of constraint-satisfying trajectories. Our method uses Stein Variational Gradient Descent (SVGD) to find a set of particles that approximates a distribution over low-cost trajectories while obeying constraints. CSVTO is applicable to problems with arbitrary equality and inequality constraints and includes a novel particle resampling step to escape local minima. By explicitly generating diverse sets of trajectories, CSVTO is better able to avoid poor local minima and is more robust to initialization. We demonstrate that CSVTO outperforms baselines in challenging highly-constrained tasks, such as a 7DoF wrench manipulation task, where CSVTO succeeds in 20/20 trials vs 13/20 for the closest baseline. Our results demonstrate that generating diverse constraint-satisfying trajectories improves robustness to disturbances and initialization over baselines.
    Persistent learning signals and working memory without continuous attractors. (arXiv:2308.12585v1 [q-bio.NC])
    Neural dynamical systems with stable attractor structures, such as point attractors and continuous attractors, are hypothesized to underlie meaningful temporal behavior that requires working memory. However, working memory may not support useful learning signals necessary to adapt to changes in the temporal structure of the environment. We show that in addition to the continuous attractors that are widely implicated, periodic and quasi-periodic attractors can also support learning arbitrarily long temporal relationships. Unlike the continuous attractors that suffer from the fine-tuning problem, the less explored quasi-periodic attractors are uniquely qualified for learning to produce temporally structured behavior. Our theory has broad implications for the design of artificial learning systems and makes predictions about observable signatures of biological neural dynamics that can support temporal dependence learning and working memory. Based on our theory, we developed a new initialization scheme for artificial recurrent neural networks that outperforms standard methods for tasks that require learning temporal dynamics. Moreover, we propose a robust recurrent memory mechanism for integrating and maintaining head direction without a ring attractor.
    To Compress or Not to Compress- Self-Supervised Learning and Information Theory: A Review. (arXiv:2304.09355v4 [cs.LG] UPDATED)
    \begin{abstract} Deep neural networks excel in supervised learning tasks but are constrained by the need for extensive labeled data. Self-supervised learning emerges as a promising alternative, allowing models to learn without explicit labels. Information theory, and notably the information bottleneck principle, has been pivotal in shaping deep neural networks. This principle focuses on optimizing the trade-off between compression and preserving relevant information, providing a foundation for efficient network design in supervised contexts. However, its precise role and adaptation in self-supervised learning remain unclear. In this work, we scrutinize various self-supervised learning approaches from an information-theoretic perspective, introducing a unified framework that encapsulates the self-supervised information-theoretic learning problem. We weave together existing research into a cohesive narrative, delve into contemporary self-supervised methodologies, and spotlight potential research avenues and inherent challenges. Additionally, we discuss the empirical evaluation of information-theoretic quantities and their estimation methods. Overall, this paper furnishes an exhaustive review of the intersection of information theory, self-supervised learning, and deep neural networks.
    Natural Language is All a Graph Needs. (arXiv:2308.07134v3 [cs.CL] UPDATED)
    The emergence of large-scale pre-trained language models, such as ChatGPT, has revolutionized various research fields in artificial intelligence. Transformers-based large language models (LLMs) have gradually replaced CNNs and RNNs to unify fields of computer vision and natural language processing. Compared with the data that exists relatively independently such as images, videos or texts, graph is a type of data that contains rich structural and relational information. Meanwhile, natural language, as one of the most expressive mediums, excels in describing complex structures. However, existing work on incorporating graph learning problems into the generative language modeling framework remains very limited. As the importance of large language models continues to grow, it becomes essential to explore whether LLMs can also replace GNNs as the foundation model for graphs. In this paper, we propose InstructGLM (Instruction-finetuned Graph Language Model), systematically design highly scalable prompts based on natural language instructions, and use natural language to describe the geometric structure and node features of the graph for instruction tuning an LLM to perform learning and inference on graphs in a generative manner. Our method exceeds all competitive GNN baselines on ogbn-arxiv, Cora and PubMed datasets, which demonstrates the effectiveness of our method and sheds light on generative large language models as the foundation model for graph machine learning.
    POLCA: Power Oversubscription in LLM Cloud Providers. (arXiv:2308.12908v1 [cs.DC])
    Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacenters to support these new workloads. One of the key bottleneck resources in datacenters is power, and given the increasing model sizes of LLMs, they are becoming increasingly power intensive. In this paper, we show that there is a significant opportunity to oversubscribe power in LLM clusters. Power oversubscription improves the power efficiency of these datacenters, allowing more deployable servers per datacenter, and reduces the deployment time, since building new datacenters is slow. We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the inference and training power consumption patterns. Based on our analysis of these LLMs, we claim that the average and peak power utilization in LLM clusters for inference should not be very high. Our deductions align with the data from production LLM clusters, revealing that inference workloads offer substantial headroom for power oversubscription. However, the stringent set of telemetry and controls that GPUs offer in a virtualized environment, makes it challenging to have a reliable and robust power oversubscription mechanism. We propose POLCA, our framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters. Using open-source models to replicate the power patterns observed in production, we simulate POLCA and demonstrate that we can deploy 30% more servers in the same GPU cluster for inference, with minimal performance loss
    Self-Supervised Training with Autoencoders for Visual Anomaly Detection. (arXiv:2206.11723v4 [cs.CV] UPDATED)
    Deep autoencoders provide an effective tool for learning non-linear dimensionality reduction in an unsupervised way. Recently, they have been used for the task of anomaly detection in the visual domain. By optimizing for the reconstruction error using anomaly-free examples, the common belief is that a corresponding network should fail to accurately reconstruct anomalous regions in the application phase. This goal is typically addressed by controlling the capacity of the network, either by reducing the size of the bottleneck layer or by enforcing sparsity constraints on the activations. However, neither of these techniques does explicitly penalize reconstruction of anomalous signals often resulting in poor detection. We tackle this problem by adapting a self-supervised learning regime that allows the use of discriminative information during training but focuses on the data manifold of normal examples. We emphasize that inference with our approach is very efficient during training and prediction requiring a single forward pass for each input image. Our experiments on the MVTec AD dataset demonstrate high detection and localization performance. On the texture-subset, in particular, our approach consistently outperforms recent anomaly detection methods by a significant margin.
    Beyond Document Page Classification: Design, Datasets, and Challenges. (arXiv:2308.12896v1 [cs.CV])
    This paper highlights the need to bring document classification benchmarking closer to real-world applications, both in the nature of data tested ($X$: multi-channel, multi-paged, multi-industry; $Y$: class distributions and label set variety) and in classification tasks considered ($f$: multi-page document, page stream, and document bundle classification, ...). We identify the lack of public multi-page document classification datasets, formalize different classification tasks arising in application scenarios, and motivate the value of targeting efficient multi-page document representations. An experimental study on proposed multi-page document classification datasets demonstrates that current benchmarks have become irrelevant and need to be updated to evaluate complete documents, as they naturally occur in practice. This reality check also calls for more mature evaluation methodologies, covering calibration evaluation, inference complexity (time-memory), and a range of realistic distribution shifts (e.g., born-digital vs. scanning noise, shifting page order). Our study ends on a hopeful note by recommending concrete avenues for future improvements.}
    Exact Bayesian Inference on Discrete Models via Probability Generating Functions: A Probabilistic Programming Approach. (arXiv:2305.17058v2 [cs.PL] UPDATED)
    We present an exact Bayesian inference method for discrete statistical models, which can find exact solutions to many discrete inference problems, even with infinite support and continuous priors. To express such models, we introduce a probabilistic programming language that supports discrete and continuous sampling, discrete observations, affine functions, (stochastic) branching, and conditioning on events. Our key tool is probability generating functions: they provide a compact closed-form representation of distributions that are definable by programs, thus enabling the exact computation of posterior probabilities, expectation, variance, and higher moments. Our inference method is provably correct, fully automated and uses automatic differentiation (specifically, Taylor polynomials), but does not require computer algebra. Our experiments show that its performance on a range of real-world examples is competitive with approximate Monte Carlo methods, while avoiding approximation errors.
    Pareto Invariant Representation Learning for Multimedia Recommendation. (arXiv:2308.04706v2 [cs.IR] UPDATED)
    Multimedia recommendation involves personalized ranking tasks, where multimedia content is usually represented using a generic encoder. However, these generic representations introduce spurious correlations that fail to reveal users' true preferences. Existing works attempt to alleviate this problem by learning invariant representations, but overlook the balance between independent and identically distributed (IID) and out-of-distribution (OOD) generalization. In this paper, we propose a framework called Pareto Invariant Representation Learning (PaInvRL) to mitigate the impact of spurious correlations from an IID-OOD multi-objective optimization perspective, by learning invariant representations (intrinsic factors that attract user attention) and variant representations (other factors) simultaneously. Specifically, PaInvRL includes three iteratively executed modules: (i) heterogeneous identification module, which identifies the heterogeneous environments to reflect distributional shifts for user-item interactions; (ii) invariant mask generation module, which learns invariant masks based on the Pareto-optimal solutions that minimize the adaptive weighted Invariant Risk Minimization (IRM) and Empirical Risk (ERM) losses; (iii) convert module, which generates both variant representations and item-invariant representations for training a multi-modal recommendation model that mitigates spurious correlations and balances the generalization performance within and cross the environmental distributions. We compare the proposed PaInvRL with state-of-the-art recommendation models on three public multimedia recommendation datasets (Movielens, Tiktok, and Kwai), and the experimental results validate the effectiveness of PaInvRL for both within- and cross-environmental learning.
    CDAN: Convolutional Dense Attention-guided Network for Low-light Image Enhancement. (arXiv:2308.12902v1 [cs.CV])
    Low-light images, characterized by inadequate illumination, pose challenges of diminished clarity, muted colors, and reduced details. Low-light image enhancement, an essential task in computer vision, aims to rectify these issues by improving brightness, contrast, and overall perceptual quality, thereby facilitating accurate analysis and interpretation. This paper introduces the Convolutional Dense Attention-guided Network (CDAN), a novel solution for enhancing low-light images. CDAN integrates an autoencoder-based architecture with convolutional and dense blocks, complemented by an attention mechanism and skip connections. This architecture ensures efficient information propagation and feature learning. Furthermore, a dedicated post-processing phase refines color balance and contrast. Our approach demonstrates notable progress compared to state-of-the-art results in low-light image enhancement, showcasing its robustness across a wide range of challenging scenarios. Our model performs remarkably on benchmark datasets, effectively mitigating under-exposure and proficiently restoring textures and colors in diverse low-light scenarios. This achievement underscores CDAN's potential for diverse computer vision tasks, notably enabling robust object detection and recognition in challenging low-light conditions.
    Bridging the Gap between Chemical Reaction Pretraining and Conditional Molecule Generation with a Unified Model. (arXiv:2303.06965v3 [cs.LG] UPDATED)
    Chemical reactions are the fundamental building blocks of drug design and organic chemistry research. In recent years, there has been a growing need for a large-scale deep-learning framework that can efficiently capture the basic rules of chemical reactions. In this paper, we have proposed a unified framework that addresses both the reaction representation learning and molecule generation tasks, which allows for a more holistic approach. Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model. Our framework achieves state-of-the-art results on challenging downstream tasks. By possessing chemical knowledge, our generative framework overcome the limitations of current molecule generation models that rely on a small number of reaction templates. In the extensive experiments, our model generates synthesizable drug-like structures of high quality. Overall, our work presents a significant step toward a large-scale deep-learning framework for a variety of reaction-based applications.
    A Survey on Dataset Distillation: Approaches, Applications and Future Directions. (arXiv:2305.01975v3 [cs.LG] UPDATED)
    Dataset distillation is attracting more attention in machine learning as training sets continue to grow and the cost of training state-of-the-art models becomes increasingly high. By synthesizing datasets with high information density, dataset distillation offers a range of potential applications, including support for continual learning, neural architecture search, and privacy protection. Despite recent advances, we lack a holistic understanding of the approaches and applications. Our survey aims to bridge this gap by first proposing a taxonomy of dataset distillation, characterizing existing approaches, and then systematically reviewing the data modalities, and related applications. In addition, we summarize the challenges and discuss future directions for this field of research.
    Towards Efficient and Comprehensive Urban Spatial-Temporal Prediction: A Unified Library and Performance Benchmark. (arXiv:2304.14343v5 [cs.LG] UPDATED)
    As deep learning technology advances and more urban spatial-temporal data accumulates, an increasing number of deep learning models are being proposed to solve urban spatial-temporal prediction problems. However, there are limitations in the existing field, including open-source data being in various formats and difficult to use, few papers making their code and data openly available, and open-source models often using different frameworks and platforms, making comparisons challenging. A standardized framework is urgently needed to implement and evaluate these methods. To address these issues, we provide a comprehensive review of urban spatial-temporal prediction and propose a unified storage format for spatial-temporal data called atomic files. We also propose LibCity, an open-source library that offers researchers a credible experimental tool and a convenient development framework. In this library, we have reproduced 65 spatial-temporal prediction models and collected 55 spatial-temporal datasets, allowing researchers to conduct comprehensive experiments conveniently. Using LibCity, we conducted a series of experiments to validate the effectiveness of different models and components, and we summarized promising future technology developments and research directions for spatial-temporal prediction. By enabling fair model comparisons, designing a unified data storage format, and simplifying the process of developing new models, LibCity is poised to make significant contributions to the spatial-temporal prediction field.
    Fast Adversarial Training with Smooth Convergence. (arXiv:2308.12857v1 [cs.LG])
    Fast adversarial training (FAT) is beneficial for improving the adversarial robustness of neural networks. However, previous FAT work has encountered a significant issue known as catastrophic overfitting when dealing with large perturbation budgets, \ie the adversarial robustness of models declines to near zero during training. To address this, we analyze the training process of prior FAT work and observe that catastrophic overfitting is accompanied by the appearance of loss convergence outliers. Therefore, we argue a moderately smooth loss convergence process will be a stable FAT process that solves catastrophic overfitting. To obtain a smooth loss convergence process, we propose a novel oscillatory constraint (dubbed ConvergeSmooth) to limit the loss difference between adjacent epochs. The convergence stride of ConvergeSmooth is introduced to balance convergence and smoothing. Likewise, we design weight centralization without introducing additional hyperparameters other than the loss balance coefficient. Our proposed methods are attack-agnostic and thus can improve the training stability of various FAT techniques. Extensive experiments on popular datasets show that the proposed methods efficiently avoid catastrophic overfitting and outperform all previous FAT methods. Code is available at \url{https://github.com/FAT-CS/ConvergeSmooth}.
    The SWAX Benchmark: Attacking Biometric Systems with Wax Figures. (arXiv:1910.09642v1 [cs.CV] CROSS LISTED)
    A face spoofing attack occurs when an intruder attempts to impersonate someone who carries a gainful authentication clearance. It is a trending topic due to the increasing demand for biometric authentication on mobile devices, high-security areas, among others. This work introduces a new database named Sense Wax Attack dataset (SWAX), comprised of real human and wax figure images and videos that endorse the problem of face spoofing detection. The dataset consists of more than 1800 face images and 110 videos of 55 people/waxworks, arranged in training, validation and test sets with a large range in expression, illumination and pose variations. Experiments performed with baseline methods show that despite the progress in recent years, advanced spoofing methods are still vulnerable to high-quality violation attempts.
    Open-set Face Recognition using Ensembles trained on Clustered Data. (arXiv:2308.07445v1 [cs.CV] CROSS LISTED)
    Open-set face recognition describes a scenario where unknown subjects, unseen during the training stage, appear on test time. Not only it requires methods that accurately identify individuals of interest, but also demands approaches that effectively deal with unfamiliar faces. This work details a scalable open-set face identification approach to galleries composed of hundreds and thousands of subjects. It is composed of clustering and an ensemble of binary learning algorithms that estimates when query face samples belong to the face gallery and then retrieves their correct identity. The approach selects the most suitable gallery subjects and uses the ensemble to improve prediction performance. We carry out experiments on well-known LFW and YTF benchmarks. Results show that competitive performance can be achieved even when targeting scalability.
    Improving Sample Quality of Diffusion Models Using Self-Attention Guidance. (arXiv:2210.00939v6 [cs.CV] UPDATED)
    Denoising diffusion models (DDMs) have attracted attention for their exceptional generation quality and diversity. This success is largely attributed to the use of class- or text-conditional diffusion guidance methods, such as classifier and classifier-free guidance. In this paper, we present a more comprehensive perspective that goes beyond the traditional guidance methods. From this generalized perspective, we introduce novel condition- and training-free strategies to enhance the quality of generated images. As a simple solution, blur guidance improves the suitability of intermediate samples for their fine-scale information and structures, enabling diffusion models to generate higher quality samples with a moderate guidance scale. Improving upon this, Self-Attention Guidance (SAG) uses the intermediate self-attention maps of diffusion models to enhance their stability and efficacy. Specifically, SAG adversarially blurs only the regions that diffusion models attend to at each iteration and guides them accordingly. Our experimental results show that our SAG improves the performance of various diffusion models, including ADM, IDDPM, Stable Diffusion, and DiT. Moreover, combining SAG with conventional guidance methods leads to further improvement.
    HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular Datasets. (arXiv:2304.03543v2 [cs.LG] UPDATED)
    Deep learning has achieved impressive performance in many domains, such as computer vision and natural language processing, but its advantage over classical shallow methods on tabular datasets remains questionable. It is especially challenging to surpass the performance of tree-like ensembles, such as XGBoost or Random Forests, on small-sized datasets (less than 1k samples). To tackle this challenge, we introduce HyperTab, a hypernetwork-based approach to solving small sample problems on tabular datasets. By combining the advantages of Random Forests and neural networks, HyperTab generates an ensemble of neural networks, where each target model is specialized to process a specific lower-dimensional view of the data. Since each view plays the role of data augmentation, we virtually increase the number of training samples while keeping the number of trainable parameters unchanged, which prevents model overfitting. We evaluated HyperTab on more than 40 tabular datasets of a varying number of samples and domains of origin, and compared its performance with shallow and deep learning models representing the current state-of-the-art. We show that HyperTab consistently outranks other methods on small data (with a statistically significant difference) and scores comparable to them on larger datasets. We make a python package with the code available to download at https://pypi.org/project/hypertab/
    Don't Look into the Sun: Adversarial Solarization Attacks on Image Classifiers. (arXiv:2308.12661v1 [cs.CV])
    Assessing the robustness of deep neural networks against out-of-distribution inputs is crucial, especially in safety-critical domains like autonomous driving, but also in safety systems where malicious actors can digitally alter inputs to circumvent safety guards. However, designing effective out-of-distribution tests that encompass all possible scenarios while preserving accurate label information is a challenging task. Existing methodologies often entail a compromise between variety and constraint levels for attacks and sometimes even both. In a first step towards a more holistic robustness evaluation of image classification models, we introduce an attack method based on image solarization that is conceptually straightforward yet avoids jeopardizing the global structure of natural images independent of the intensity. Through comprehensive evaluations of multiple ImageNet models, we demonstrate the attack's capacity to degrade accuracy significantly, provided it is not integrated into the training augmentations. Interestingly, even then, no full immunity to accuracy deterioration is achieved. In other settings, the attack can often be simplified into a black-box attack with model-independent parameters. Defenses against other corruptions do not consistently extend to be effective against our specific attack. Project website: https://github.com/paulgavrikov/adversarial_solarization
    Conformal Prediction Regions for Time Series using Linear Complementarity Programming. (arXiv:2304.01075v3 [eess.SY] UPDATED)
    Conformal prediction is a statistical tool for producing prediction regions of machine learning models that are valid with high probability. However, applying conformal prediction to time series data leads to conservative prediction regions. In fact, to obtain prediction regions over $T$ time steps with confidence $1-\delta$, {previous works require that each individual prediction region is valid} with confidence $1-\delta/T$. We propose an optimization-based method for reducing this conservatism to enable long horizon planning and verification when using learning-enabled time series predictors. Instead of considering prediction errors individually at each time step, we consider a parameterized prediction error over multiple time steps. By optimizing the parameters over an additional dataset, we find prediction regions that are not conservative. We show that this problem can be cast as a mixed integer linear complementarity program (MILCP), which we then relax into a linear complementarity program (LCP). Additionally, we prove that the relaxed LP has the same optimal cost as the original MILCP. Finally, we demonstrate the efficacy of our method on case studies using pedestrian trajectory predictors and F16 fighter jet altitude predictors.
    On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget. (arXiv:2308.12000v2 [stat.ML] UPDATED)
    We study the problem of best-arm identification with fixed budget in stochastic two-arm bandits with Bernoulli rewards. We prove that surprisingly, there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (this algorithm is referred to as the {\it uniform sampling} algorithm) on all instances, and that (ii) strictly outperforms this algorithm on at least one instance. In short, there is no algorithm better than the uniform sampling algorithm. Towards this result, we introduce the natural class of {\it consistent} and {\it stable} algorithms, and show that any algorithm that performs as well as the uniform sampling algorithm on all instances belongs to this class. The proof is completed by deriving a lower bound on the error rate satisfied by any consistent and stable algorithm, and by showing that the uniform sampling algorithm matches this lower bound. Our results provide a solution to the two open problems presented in \cite{qin2022open}.
    Optimal data pooling for shared learning in maintenance operations. (arXiv:2308.12670v1 [cs.LG])
    This paper addresses the benefits of pooling data for shared learning in maintenance operations. We consider a set of systems subject to Poisson degradation that are coupled through an a-priori unknown rate. Decision problems involving these systems are high-dimensional Markov decision processes (MDPs). We present a decomposition result that reduces such an MDP to two-dimensional MDPs, enabling structural analyses and computations. We leverage this decomposition to demonstrate that pooling data can lead to significant cost reductions compared to not pooling.
    Hypergraph Convolutional Networks for Fine-grained ICU Patient Similarity Analysis and Risk Prediction. (arXiv:2308.12575v1 [cs.LG])
    The Intensive Care Unit (ICU) is one of the most important parts of a hospital, which admits critically ill patients and provides continuous monitoring and treatment. Various patient outcome prediction methods have been attempted to assist healthcare professionals in clinical decision-making. Existing methods focus on measuring the similarity between patients using deep neural networks to capture the hidden feature structures. However, the higher-order relationships are ignored, such as patient characteristics (e.g., diagnosis codes) and their causal effects on downstream clinical predictions. In this paper, we propose a novel Hypergraph Convolutional Network that allows the representation of non-pairwise relationships among diagnosis codes in a hypergraph to capture the hidden feature structures so that fine-grained patient similarity can be calculated for personalized mortality risk prediction. Evaluation using a publicly available eICU Collaborative Research Database indicates that our method achieves superior performance over the state-of-the-art models on mortality risk prediction. Moreover, the results of several case studies demonstrated the effectiveness of constructing graph networks in providing good transparency and robustness in decision-making.
    Multi-fidelity Fourier Neural Operator for Fast Modeling of Large-Scale Geological Carbon Storage. (arXiv:2308.09113v2 [stat.ML] UPDATED)
    Deep learning-based surrogate models have been widely applied in geological carbon storage (GCS) problems to accelerate the prediction of reservoir pressure and CO2 plume migration. Large amounts of data from physics-based numerical simulators are required to train a model to accurately predict the complex physical behaviors associated with this process. In practice, the available training data are always limited in large-scale 3D problems due to the high computational cost. Therefore, we propose to use a multi-fidelity Fourier Neural Operator to solve large-scale GCS problems with more affordable multi-fidelity training datasets. The Fourier Neural Operator has a desirable grid-invariant property, which simplifies the transfer learning procedure between datasets with different discretization. We first test the model efficacy on a GCS reservoir model being discretized into 110k grid cells. The multi-fidelity model can predict with accuracy comparable to a high-fidelity model trained with the same amount of high-fidelity data with 81% less data generation costs. We further test the generalizability of the multi-fidelity model on a same reservoir model with a finer discretization of 1 million grid cells. This case was made more challenging by employing high-fidelity and low-fidelity datasets generated by different geostatistical models and reservoir simulators. We observe that the multi-fidelity FNO model can predict pressure fields with reasonable accuracy even when the high-fidelity data are extremely limited.
    Adversarial Training Using Feedback Loops. (arXiv:2308.11881v2 [cs.LG] UPDATED)
    Deep neural networks (DNN) have found wide applicability in numerous fields due to their ability to accurately learn very complex input-output relations. Despite their accuracy and extensive use, DNNs are highly susceptible to adversarial attacks due to limited generalizability. For future progress in the field, it is essential to build DNNs that are robust to any kind of perturbations to the data points. In the past, many techniques have been proposed to robustify DNNs using first-order derivative information of the network. This paper proposes a new robustification approach based on control theory. A neural network architecture that incorporates feedback control, named Feedback Neural Networks, is proposed. The controller is itself a neural network, which is trained using regular and adversarial data such as to stabilize the system outputs. The novel adversarial training approach based on the feedback control architecture is called Feedback Looped Adversarial Training (FLAT). Numerical results on standard test problems empirically show that our FLAT method is more effective than the state-of-the-art to guard against adversarial attacks.
    Unifying Gradients to Improve Real-world Robustness for Deep Networks. (arXiv:2208.06228v2 [stat.ML] UPDATED)
    The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with SQAs. In this paper, we propose a real-world defense by Unifying Gradients (UniG) of different data so that SQAs could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating attackers a twisted and less informative attack direction. We implement UniG efficiently by a Hadamard product module which is plug-and-play. According to extensive experiments on 5 SQAs, 2 adaptive attacks and 7 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a model of 77.80% accuracy under 2500-query Square attack while the state-of-the-art adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG outperforms all compared baselines in terms of clean accuracy and achieves the smallest modification of the model output. The code is released at https://github.com/snowien/UniG-pytorch.
    Open-set Face Recognition with Neural Ensemble, Maximal Entropy Loss and Feature Augmentation. (arXiv:2308.12371v1 [cs.CV])
    Open-set face recognition refers to a scenario in which biometric systems have incomplete knowledge of all existing subjects. Therefore, they are expected to prevent face samples of unregistered subjects from being identified as previously enrolled identities. This watchlist context adds an arduous requirement that calls for the dismissal of irrelevant faces by focusing mainly on subjects of interest. As a response, this work introduces a novel method that associates an ensemble of compact neural networks with a margin-based cost function that explores additional samples. Supplementary negative samples can be obtained from external databases or synthetically built at the representation level in training time with a new mix-up feature augmentation approach. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is able to boost closed and open-set identification rates.
    APART: Diverse Skill Discovery using All Pairs with Ascending Reward and DropouT. (arXiv:2308.12649v1 [cs.LG])
    We study diverse skill discovery in reward-free environments, aiming to discover all possible skills in simple grid-world environments where prior methods have struggled to succeed. This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory. Our initial solution replaces the standard one-vs-all (softmax) discriminator with a one-vs-one (all pairs) discriminator and combines it with a novel intrinsic reward function and a dropout regularization technique. The combined approach is named APART: Diverse Skill Discovery using All Pairs with Ascending Reward and Dropout. We demonstrate that APART discovers all the possible skills in grid worlds with remarkably fewer samples than previous works. Motivated by the empirical success of APART, we further investigate an even simpler algorithm that achieves maximum skills by altering VIC, rescaling its intrinsic reward, and tuning the temperature of its softmax discriminator. We believe our findings shed light on the crucial factors underlying success of skill discovery algorithms in reinforcement learning.
    Job Shop Scheduling Benchmark: Environments and Instances for Learning and Non-learning Methods. (arXiv:2308.12794v1 [cs.AI])
    We introduce an open-source GitHub repository containing comprehensive benchmarks for a wide range of machine scheduling problems, including Job Shop Scheduling (JSP), Flow Shop Scheduling (FSP), Flexible Job Shop Scheduling (FJSP), FJSP with Assembly constraints (FAJSP), FJSP with Sequence-Dependent Setup Times (FJSP-SDST), and the online FJSP (with online job arrivals). Our primary goal is to provide a centralized hub for researchers, practitioners, and enthusiasts interested in tackling machine scheduling challenges.
    Auto-weighted Bayesian Physics-Informed Neural Networks and robust estimations for multitask inverse problems in pore-scale imaging of dissolution. (arXiv:2308.12864v1 [cs.LG])
    In this article, we present a novel data assimilation strategy in pore-scale imaging and demonstrate that this makes it possible to robustly address reactive inverse problems incorporating Uncertainty Quantification (UQ). Pore-scale modeling of reactive flow offers a valuable opportunity to investigate the evolution of macro-scale properties subject to dynamic processes. Yet, they suffer from imaging limitations arising from the associated X-ray microtomography (X-ray microCT) process, which induces discrepancies in the properties estimates. Assessment of the kinetic parameters also raises challenges, as reactive coefficients are critical parameters that can cover a wide range of values. We account for these two issues and ensure reliable calibration of pore-scale modeling, based on dynamical microCT images, by integrating uncertainty quantification in the workflow. The present method is based on a multitasking formulation of reactive inverse problems combining data-driven and physics-informed techniques in calcite dissolution. This allows quantifying morphological uncertainties on the porosity field and estimating reactive parameter ranges through prescribed PDE models with a latent concentration field and dynamical microCT. The data assimilation strategy relies on sequential reinforcement incorporating successively additional PDE constraints. We guarantee robust and unbiased uncertainty quantification by straightforward adaptive weighting of Bayesian Physics-Informed Neural Networks (BPINNs), ensuring reliable micro-porosity changes during geochemical transformations. We demonstrate successful Bayesian Inference in 1D+Time and 2D+Time calcite dissolution based on synthetic microCT images with meaningful posterior distribution on the reactive parameters and dimensionless numbers.
    Minimum intrinsic dimension scaling for entropic optimal transport. (arXiv:2306.03398v2 [math.ST] UPDATED)
    Motivated by the manifold hypothesis, which states that data with a high extrinsic dimension may yet have a low intrinsic dimension, we develop refined statistical bounds for entropic optimal transport that are sensitive to the intrinsic dimension of the data. Our bounds involve a robust notion of intrinsic dimension, measured at only a single distance scale depending on the regularization parameter, and show that it is only the minimum of these single-scale intrinsic dimensions which governs the rate of convergence. We call this the Minimum Intrinsic Dimension scaling (MID scaling) phenomenon, and establish MID scaling with no assumptions on the data distributions so long as the cost is bounded and Lipschitz, and for various entropic optimal transport quantities beyond just values, with stronger analogs when one distribution is supported on a manifold. Our results significantly advance the theoretical state of the art by showing that MID scaling is a generic phenomenon, and provide the first rigorous interpretation of the statistical effect of entropic regularization as a distance scale.
    BadVFL: Backdoor Attacks in Vertical Federated Learning. (arXiv:2304.08847v2 [cs.LG] UPDATED)
    Federated learning (FL) enables multiple parties to collaboratively train a machine learning model without sharing their data; rather, they train their own model locally and send updates to a central server for aggregation. Depending on how the data is distributed among the participants, FL can be classified into Horizontal (HFL) and Vertical (VFL). In VFL, the participants share the same set of training instances but only host a different and non-overlapping subset of the whole feature space. Whereas in HFL, each participant shares the same set of features while the training set is split into locally owned training data subsets. VFL is increasingly used in applications like financial fraud detection; nonetheless, very little work has analyzed its security. In this paper, we focus on robustness in VFL, in particular, on backdoor attacks, whereby an adversary attempts to manipulate the aggregate model during the training process to trigger misclassifications. Performing backdoor attacks in VFL is more challenging than in HFL, as the adversary i) does not have access to the labels during training and ii) cannot change the labels as she only has access to the feature embeddings. We present a first-of-its-kind clean-label backdoor attack in VFL, which consists of two phases: a label inference and a backdoor phase. We demonstrate the effectiveness of the attack on three different datasets, investigate the factors involved in its success, and discuss countermeasures to mitigate its impact.
    A Survey on Blood Pressure Measurement Technologies: Addressing Potential Sources of Bias. (arXiv:2306.08451v2 [physics.med-ph] UPDATED)
    Regular blood pressure (BP) monitoring in clinical and ambulatory settings plays a crucial role in the prevention, diagnosis, treatment, and management of cardiovascular diseases. Recently, the widespread adoption of ambulatory BP measurement devices has been driven predominantly by the increased prevalence of hypertension and its associated risks and clinical conditions. Recent guidelines advocate for regular BP monitoring as part of regular clinical visits or even at home. This increased utilization of BP measurement technologies has brought up significant concerns, regarding the accuracy of reported BP values across settings. In this survey, focusing mainly on cuff-based BP monitoring technologies, we highlight how BP measurements can demonstrate substantial biases and variances due to factors such as measurement and device errors, demographics, and body habitus. With these inherent biases, the development of a new generation of cuff-based BP devices which use artificial-intelligence (AI) has significant potential. We present future avenues where AI-assisted technologies can leverage the extensive clinical literature on BP-related studies together with the large collections of BP records available in electronic health records. These resources can be combined with machine learning approaches, including deep learning and Bayesian inference, to remove BP measurement biases and to provide individualized BP-related cardiovascular risk indexes.
    BagPipe: Accelerating Deep Recommendation Model Training. (arXiv:2202.12429v3 [cs.DC] UPDATED)
    Deep learning based recommendation models (DLRM) are widely used in several business critical applications. Training such recommendation models efficiently is challenging because they contain billions of embedding-based parameters, leading to significant overheads from embedding access. By profiling existing systems for DLRM training, we observe that around 75\% of the iteration time is spent on embedding access and model synchronization. Our key insight in this paper is that embedding access has a specific structure which can be used to accelerate training. We observe that embedding accesses are heavily skewed, with around 1\% of embeddings representing more than 92\% of total accesses. Further, we observe that during offline training we can lookahead at future batches to determine exactly which embeddings will be needed at what iteration in the future. Based on these insights, we develop Bagpipe, a system for training deep recommendation models that uses caching and prefetching to overlap remote embedding accesses with the computation. We design an Oracle Cacher, a new component that uses a lookahead algorithm to generate optimal cache update decisions while providing strong consistency guarantees against staleness. We also design a logically replicated, physically partitioned cache and show that our design can reduce synchronization overheads in a distributed setting. Finally, we propose a disaggregated system architecture and show that our design can enable low-overhead fault tolerance. Our experiments using three datasets and four models show that Bagpipe provides a speed up of up to 5.6x compared to state of the art baselines, while providing the same convergence and reproducibility guarantees as synchronous training.
    Human Comprehensible Active Learning of Genome-Scale Metabolic Networks. (arXiv:2308.12740v1 [cs.AI])
    An important application of Synthetic Biology is the engineering of the host cell system to yield useful products. However, an increase in the scale of the host system leads to huge design space and requires a large number of validation trials with high experimental costs. A comprehensible machine learning approach that efficiently explores the hypothesis space and guides experimental design is urgently needed for the Design-Build-Test-Learn (DBTL) cycle of the host cell system. We introduce a novel machine learning framework ILP-iML1515 based on Inductive Logic Programming (ILP) that performs abductive logical reasoning and actively learns from training examples. In contrast to numerical models, ILP-iML1515 is built on comprehensible logical representations of a genome-scale metabolic model and can update the model by learning new logical structures from auxotrophic mutant trials. The ILP-iML1515 framework 1) allows high-throughput simulations and 2) actively selects experiments that reduce the experimental cost of learning gene functions in comparison to randomly selected experiments.
    Motion In-Betweening with Phase Manifolds. (arXiv:2308.12751v1 [cs.GR])
    This paper introduces a novel data-driven motion in-betweening system to reach target poses of characters by making use of phases variables learned by a Periodic Autoencoder. Our approach utilizes a mixture-of-experts neural network model, in which the phases cluster movements in both space and time with different expert weights. Each generated set of weights then produces a sequence of poses in an autoregressive manner between the current and target state of the character. In addition, to satisfy poses which are manually modified by the animators or where certain end effectors serve as constraints to be reached by the animation, a learned bi-directional control scheme is implemented to satisfy such constraints. The results demonstrate that using phases for motion in-betweening tasks sharpen the interpolated movements, and furthermore stabilizes the learning process. Moreover, using phases for motion in-betweening tasks can also synthesize more challenging movements beyond locomotion behaviors. Additionally, style control is enabled between given target keyframes. Our proposed framework can compete with popular state-of-the-art methods for motion in-betweening in terms of motion quality and generalization, especially in the existence of long transition durations. Our framework contributes to faster prototyping workflows for creating animated character sequences, which is of enormous interest for the game and film industry.
    Near Optimal Adversarial Attack on UCB Bandits. (arXiv:2008.09312v6 [cs.LG] UPDATED)
    I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. I propose a novel attack strategy that manipulates a learner employing the UCB algorithm into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\widehat{O}(\sqrt{\log T})$, where $T$ is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to $O(\log \log T)$ factors, showing the proposed attack strategy to be near optimal.
    Prediction without Preclusion: Recourse Verification with Reachable Sets. (arXiv:2308.12820v1 [cs.LG])
    Machine learning models are often used to decide who will receive a loan, a job interview, or a public benefit. Standard techniques to build these models use features about people but overlook their actionability. In turn, models can assign predictions that are fixed, meaning that consumers who are denied loans, interviews, or benefits may be permanently locked out from access to credit, employment, or assistance. In this work, we introduce a formal testing procedure to flag models that assign fixed predictions that we call recourse verification. We develop machinery to reliably determine if a given model can provide recourse to its decision subjects from a set of user-specified actionability constraints. We demonstrate how our tools can ensure recourse and adversarial robustness in real-world datasets and use them to study the infeasibility of recourse in real-world lending datasets. Our results highlight how models can inadvertently assign fixed predictions that permanently bar access, and we provide tools to design algorithms that account for actionability when developing models.
    Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection. (arXiv:2308.12885v1 [cs.LG])
    The rapid entry of machine learning approaches in our daily activities and high-stakes domains demands transparency and scrutiny of their fairness and reliability. To help gauge machine learning models' robustness, research typically focuses on the massive datasets used for their deployment, e.g., creating and maintaining documentation for understanding their origin, process of development, and ethical considerations. However, data collection for AI is still typically a one-off practice, and oftentimes datasets collected for a certain purpose or application are reused for a different problem. Additionally, dataset annotations may not be representative over time, contain ambiguous or erroneous annotations, or be unable to generalize across issues or domains. Recent research has shown these practices might lead to unfair, biased, or inaccurate outcomes. We argue that data collection for AI should be performed in a responsible manner where the quality of the data is thoroughly scrutinized and measured through a systematic set of appropriate metrics. In this paper, we propose a Responsible AI (RAI) methodology designed to guide the data collection with a set of metrics for an iterative in-depth analysis of the factors influencing the quality and reliability} of the generated data. We propose a granular set of measurements to inform on the internal reliability of a dataset and its external stability over time. We validate our approach across nine existing datasets and annotation tasks and four content modalities. This approach impacts the assessment of data robustness used for AI applied in the real world, where diversity of users and content is eminent. Furthermore, it deals with fairness and accountability aspects in data collection by providing systematic and transparent quality analysis for data collections.
    The Polynomial Method is Universal for Distribution-Free Correlational SQ Learning. (arXiv:2010.11925v3 [cs.DS] UPDATED)
    We consider the problem of distribution-free learning for Boolean function classes in the PAC and agnostic models. Generalizing a beautiful work of Malach and Shalev-Shwartz (2022) that gave tight correlational SQ (CSQ) lower bounds for learning DNF formulas, we give new proofs that lower bounds on the threshold or approximate degree of any function class directly imply CSQ lower bounds for PAC or agnostic learning respectively. While such bounds implicitly follow by combining prior results by Feldman (2008, 2012) and Sherstov (2008, 2011), to our knowledge the precise statements we give had not appeared in this form before. Moreover, our proofs are simple and largely self-contained. These lower bounds match corresponding positive results using upper bounds on the threshold or approximate degree in the SQ model for PAC or agnostic learning, and in this sense these results show that the polynomial method is a universal, best-possible approach for distribution-free CSQ learning.
    Uncertainty and Explainable Analysis of Machine Learning Model for Reconstruction of Sonic Slowness Logs. (arXiv:2308.12625v1 [cs.LG])
    Logs are valuable information for oil and gas fields as they help to determine the lithology of the formations surrounding the borehole and the location and reserves of subsurface oil and gas reservoirs. However, important logs are often missing in horizontal or old wells, which poses a challenge in field applications. In this paper, we utilize data from the 2020 machine learning competition of the SPWLA, which aims to predict the missing compressional wave slowness and shear wave slowness logs using other logs in the same borehole. We employ the NGBoost algorithm to construct an Ensemble Learning model that can predicate the results as well as their uncertainty. Furthermore, we combine the SHAP method to investigate the interpretability of the machine learning model. We compare the performance of the NGBosst model with four other commonly used Ensemble Learning methods, including Random Forest, GBDT, XGBoost, LightGBM. The results show that the NGBoost model performs well in the testing set and can provide a probability distribution for the prediction results. In addition, the variance of the probability distribution of the predicted log can be used to justify the quality of the constructed log. Using the SHAP explainable machine learning model, we calculate the importance of each input log to the predicted results as well as the coupling relationship among input logs. Our findings reveal that the NGBoost model tends to provide greater slowness prediction results when the neutron porosity and gamma ray are large, which is consistent with the cognition of petrophysical models. Furthermore, the machine learning model can capture the influence of the changing borehole caliper on slowness, where the influence of borehole caliper on slowness is complex and not easy to establish a direct relationship. These findings are in line with the physical principle of borehole acoustics.
    Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor Attacks. (arXiv:2302.00747v3 [cs.LG] UPDATED)
    Deep learning models achieve excellent performance in numerous machine learning tasks. Yet, they suffer from security-related issues such as adversarial examples and poisoning (backdoor) attacks. A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters. Then, a backdoored model performs as expected when receiving a clean input, but it misclassifies when receiving a backdoored input stamped with a pre-designed pattern called "trigger". Unfortunately, it is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger. This paper proposes a backdoor detection method by utilizing a special type of adversarial attack, universal adversarial perturbation (UAP), and its similarities with a backdoor trigger. We observe an intuitive phenomenon: UAPs generated from backdoored models need fewer perturbations to mislead the model than UAPs from clean models. UAPs of backdoored models tend to exploit the shortcut from all classes to the target class, built by the backdoor trigger. We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs. Experiments on 345 models trained on several datasets show that USB effectively detects the injected backdoor and provides comparable or better results than state-of-the-art methods.
    Inverse Lithography Physics-informed Deep Neural Level Set for Mask Optimization. (arXiv:2308.12299v1 [eess.IV])
    As the feature size of integrated circuits continues to decrease, optical proximity correction (OPC) has emerged as a crucial resolution enhancement technology for ensuring high printability in the lithography process. Recently, level set-based inverse lithography technology (ILT) has drawn considerable attention as a promising OPC solution, showcasing its powerful pattern fidelity, especially in advanced process. However, massive computational time consumption of ILT limits its applicability to mainly correcting partial layers and hotspot regions. Deep learning (DL) methods have shown great potential in accelerating ILT. However, lack of domain knowledge of inverse lithography limits the ability of DL-based algorithms in process window (PW) enhancement and etc. In this paper, we propose an inverse lithography physics-informed deep neural level set (ILDLS) approach for mask optimization. This approach utilizes level set based-ILT as a layer within the DL framework and iteratively conducts mask prediction and correction to significantly enhance printability and PW in comparison with results from pure DL and ILT. With this approach, computation time is reduced by a few orders of magnitude versus ILT. By gearing up DL with knowledge of inverse lithography physics, ILDLS provides a new and efficient mask optimization solution.
    ICU Mortality Prediction Using Long Short-Term Memory Networks. (arXiv:2308.12800v1 [cs.LG])
    Extensive bedside monitoring in Intensive Care Units (ICUs) has resulted in complex temporal data regarding patient physiology, which presents an upscale context for clinical data analysis. In the other hand, identifying the time-series patterns within these data may provide a high aptitude to predict clinical events. Hence, we investigate, during this work, the implementation of an automatic data-driven system, which analyzes large amounts of multivariate temporal data derived from Electronic Health Records (EHRs), and extracts high-level information so as to predict in-hospital mortality and Length of Stay (LOS) early. Practically, we investigate the applicability of LSTM network by reducing the time-frame to 6-hour so as to enhance clinical tasks. The experimental results highlight the efficiency of LSTM model with rigorous multivariate time-series measurements for building real-world prediction engines.
    Fast Exact NPN Classification with Influence-aided Canonical Form. (arXiv:2308.12311v1 [cs.LG])
    NPN classification has many applications in the synthesis and verification of digital circuits. The canonical-form-based method is the most common approach, designing a canonical form as representative for the NPN equivalence class first and then computing the transformation function according to the canonical form. Most works use variable symmetries and several signatures, mainly based on the cofactor, to simplify the canonical form construction and computation. This paper describes a novel canonical form and its computation algorithm by introducing Boolean influence to NPN classification, which is a basic concept in analysis of Boolean functions. We show that influence is input-negation-independent, input-permutation-dependent, and has other structural information than previous signatures for NPN classification. Therefore, it is a significant ingredient in speeding up NPN classification. Experimental results prove that influence plays an important role in reducing the transformation enumeration in computing the canonical form. Compared with the state-of-the-art algorithm implemented in ABC, our influence-aided canonical form for exact NPN classification gains up to 5.5x speedup.
    A Greedy Approach for Offering to Telecom Subscribers. (arXiv:2308.12606v1 [stat.ML])
    Customer retention or churn prevention is a challenging task of a telecom operator. One of the effective approaches is to offer some attractive incentive or additional services or money to the subscribers for keeping them engaged and make sure they stay in the operator's network for longer time. Often, operators allocate certain amount of monetary budget to carry out the offer campaign. The difficult part of this campaign is the selection of a set of customers from a large subscriber-base and deciding the amount that should be offered to an individual so that operator's objective is achieved. There may be multiple objectives (e.g., maximizing revenue, minimizing number of churns) for selection of subscriber and selection of an offer to the selected subscriber. Apart from monetary benefit, offers may include additional data, SMS, hots-spot tethering, and many more. This problem is known as offer optimization. In this paper, we propose a novel combinatorial algorithm for solving offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn, which is, in general, seen in telecom domain. The proposed algorithm is efficient and accurate even for a very large subscriber-base.
    Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation. (arXiv:2209.10634v2 [q-bio.NC] UPDATED)
    Early sensory systems in the brain rapidly adapt to fluctuating input statistics, which requires recurrent communication between neurons. Mechanistically, such recurrent communication is often indirect and mediated by local interneurons. In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. To this end, we consider two mathematically tractable recurrent linear neural networks that statistically whiten their inputs -- one with direct recurrent connections and the other with interneurons that mediate recurrent communication. By analyzing the corresponding continuous synaptic dynamics and numerically simulating the networks, we show that the network with interneurons is more robust to initialization than the network with direct recurrent connections in the sense that the convergence time for the synaptic dynamics in the network with interneurons (resp. direct recurrent connections) scales logarithmically (resp. linearly) with the spectrum of their initialization. Our results suggest that interneurons are computationally useful for rapid adaptation to changing input statistics. Interestingly, the network with interneurons is an overparameterized solution of the whitening objective for the network with direct recurrent connections, so our results can be viewed as a recurrent linear neural network analogue of the implicit acceleration phenomenon observed in overparameterized feedforward linear neural networks.
    Dense Text-to-Image Generation with Attention Modulation. (arXiv:2308.12964v1 [cs.CV])
    Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a training-free method that adapts a pre-trained text-to-image model to handle such dense captions while offering control over the scene layout. We first analyze the relationship between generated images' layouts and the pre-trained model's intermediate attention maps. Next, we develop an attention modulation method that guides objects to appear in specific regions according to layout guidance. Without requiring additional fine-tuning or datasets, we improve image generation performance given dense captions regarding both automatic and human evaluation scores. In addition, we achieve similar-quality visual results with models specifically trained with layout conditions.
    NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes. (arXiv:2308.12967v1 [cs.CV])
    Recent implicit neural representations have shown great results for novel view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neural fields for sparse view synthesis of outdoor scenes. NeO 360 is a generalizable method that reconstructs 360{\deg} scenes from a single or a few posed RGB images. The essence of our approach is in capturing the distribution of complex real-world outdoor 3D scenes and using a hybrid image-conditional triplanar representation that can be queried from any world point. Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations and is more effective and expressive than each. NeO 360's representation allows us to learn from a large collection of unbounded 3D scenes while offering generalizability to new views and novel scenes from as few as a single image during inference. We demonstrate our approach on the proposed challenging 360{\deg} unbounded dataset, called NeRDS 360, and show that NeO 360 outperforms state-of-the-art generalizable methods for novel view synthesis while also offering editing and composition capabilities. Project page: https://zubair-irshad.github.io/projects/neo360.html
    StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random. (arXiv:2205.04701v3 [cs.LG] UPDATED)
    In recommender systems, users always choose the favorite items to rate, which leads to data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) methods have been widely studied and demonstrate superior performance. However, in this paper, we show that DR methods are unstable and have unbounded bias, variance, and generalization bounds to extremely small propensities. Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. Theoretical analysis shows that StableDR has bounded bias, variance, and generalization error bound simultaneously under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for StableDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approaches significantly outperform the existing methods.
    Dealing with Small Datasets for Deep Learning in Medical Imaging: An Evaluation of Self-Supervised Pre-Training on CT Scans Comparing Contrastive and Masked Autoencoder Methods for Convolutional Models. (arXiv:2308.06534v2 [cs.CV] UPDATED)
    Deep learning in medical imaging has the potential to minimize the risk of diagnostic errors, reduce radiologist workload, and accelerate diagnosis. Training such deep learning models requires large and accurate datasets, with annotations for all training samples. However, in the medical imaging domain, annotated datasets for specific tasks are often small due to the high complexity of annotations, limited access, or the rarity of diseases. To address this challenge, deep learning models can be pre-trained on large image datasets without annotations using methods from the field of self-supervised learning. After pre-training, small annotated datasets are sufficient to fine-tune the models for a specific task. The most popular self-supervised pre-training approaches in medical imaging are based on contrastive learning. However, recent studies in natural image processing indicate a strong potential for masked autoencoder approaches. Our work compares state-of-the-art contrastive learning methods with the recently introduced masked autoencoder approach "SparK" for convolutional neural networks (CNNs) on medical images. Therefore we pre-train on a large unannotated CT image dataset and fine-tune on several CT classification tasks. Due to the challenge of obtaining sufficient annotated training data in medical imaging, it is of particular interest to evaluate how the self-supervised pre-training methods perform when fine-tuning on small datasets. By experimenting with gradually reducing the training dataset size for fine-tuning, we find that the reduction has different effects depending on the type of pre-training chosen. The SparK pre-training method is more robust to the training dataset size than the contrastive methods. Based on our results, we propose the SparK pre-training for medical imaging tasks with only small annotated datasets.
    Integer Factorisation, Fermat & Machine Learning on a Classical Computer. (arXiv:2308.12290v1 [cs.LG])
    In this paper we describe a deep learning--based probabilistic algorithm for integer factorisation. We use Lawrence's extension of Fermat's factorisation algorithm to reduce the integer factorisation problem to a binary classification problem. To address the classification problem, based on the ease of generating large pseudo--random primes, a corpus of training data, as large as needed, is synthetically generated. We will introduce the algorithm, summarise some experiments, analyse where these experiments fall short, and finally put out a call to others to reproduce, verify and see if this approach can be improved to a point where it becomes a practical, scalable factorisation algorithm.
    Easy attention: A simple self-attention mechanism for Transformers. (arXiv:2308.12874v1 [cs.LG])
    To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention. Due to the fact that self attention only makes usage of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through implementing singular-value decomposition (SVD) on the softmax attention score, we further observe that the self attention compresses contribution from both queries and keys in the spanned space of the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and predicting the temporal dynamics of chaotic systems exhibiting more robustness and less complexity than the self attention or the widely-used long short-term memory (LSTM) network. Our results show great potential for applications in more complex high-dimensional dynamical systems.
    Deep Reinforcement Learning-driven Cross-Community Energy Interaction Optimal Scheduling. (arXiv:2308.12554v1 [eess.SY])
    In order to coordinate energy interactions among various communities and energy conversions among multi-energy subsystems within the multi-community integrated energy system under uncertain conditions, and achieve overall optimization and scheduling of the comprehensive energy system, this paper proposes a comprehensive scheduling model that utilizes a multi-agent deep reinforcement learning algorithm to learn load characteristics of different communities and make decisions based on this knowledge. In this model, the scheduling problem of the integrated energy system is transformed into a Markov decision process and solved using a data-driven deep reinforcement learning algorithm, which avoids the need for modeling complex energy coupling relationships between multi-communities and multi-energy subsystems. The simulation results show that the proposed method effectively captures the load characteristics of different communities and utilizes their complementary features to coordinate reasonable energy interactions among them. This leads to a reduction in wind curtailment rate from 16.3% to 0% and lowers the overall operating cost by 5445.6 Yuan, demonstrating significant economic and environmental benefits.
    LANISTR: Multimodal Learning from Structured and Unstructured Data. (arXiv:2305.16556v2 [cs.LG] UPDATED)
    Multimodal large-scale pretraining has shown impressive performance for unstructured data including language, image, audio, and video. However, a prevalent real-world scenario involves the combination of structured data types (tabular, time-series) with unstructured data which has so far been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datastes, MIMIC-IV (healthcare) and Amazon Product Review (retail), LANISTR demonstrates remarkable absolute improvements of 6.6\% (AUROC) and up to 14\% (accuracy) when fine-tuned on 0.1\% and 0.01\% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even in the presence of considerable missingness ratios of 35.7\% and 99.8\%, in the respective datasets.
    Transforming to Yoked Neural Networks to Improve ANN Structure. (arXiv:2306.02157v3 [cs.LG] UPDATED)
    Most existing classical artificial neural networks (ANN) are designed as a tree structure to imitate neural networks. In this paper, we argue that the connectivity of a tree is not sufficient to characterize a neural network. The nodes of the same level of a tree cannot be connected with each other, i.e., these neural unit cannot share information with each other, which is a major drawback of ANN. Although ANN has been significantly improved in recent years to more complex structures, such as the directed acyclic graph (DAG), these methods also have unidirectional and acyclic bias for ANN. In this paper, we propose a method to build a bidirectional complete graph for the nodes in the same level of an ANN, which yokes the nodes of the same level to formulate a neural module. We call our model as YNN in short. YNN promotes the information transfer significantly which obviously helps in improving the performance of the method. Our YNN can imitate neural networks much better compared with the traditional ANN. In this paper, we analyze the existing structural bias of ANN and propose a model YNN to efficiently eliminate such structural bias. In our model, nodes also carry out aggregation and transformation of features, and edges determine the flow of information. We further impose auxiliary sparsity constraint to the distribution of connectedness, which promotes the learned structure to focus on critical connections. Finally, based on the optimized structure, we also design small neural module structure based on the minimum cut technique to reduce the computational burden of the YNN model. This learning process is compatible with the existing networks and different tasks. The obtained quantitative experimental results reflect that the learned connectivity is superior to the traditional NN structure.
    BridgeData V2: A Dataset for Robot Learning at Scale. (arXiv:2308.12952v1 [cs.RO])
    We introduce BridgeData V2, a large and diverse dataset of robotic manipulation behaviors designed to facilitate research on scalable robot learning. BridgeData V2 contains 60,096 trajectories collected across 24 environments on a publicly available low-cost robot. BridgeData V2 provides extensive task and environment variability, leading to skills that can generalize across environments, domains, and institutions, making the dataset a useful resource for a broad range of researchers. Additionally, the dataset is compatible with a wide variety of open-vocabulary, multi-task learning methods conditioned on goal images or natural language instructions. In our experiments, we train 6 state-of-the-art imitation learning and offline reinforcement learning methods on our dataset, and find that they succeed on a suite of tasks requiring varying amounts of generalization. We also demonstrate that the performance of these methods improves with more data and higher capacity models, and that training on a greater variety of skills leads to improved generalization. By publicly sharing BridgeData V2 and our pre-trained models, we aim to accelerate research in scalable robot learning methods. Project page at https://rail-berkeley.github.io/bridgedata
    LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral Competition. (arXiv:2308.12882v1 [cs.SD])
    Audio classification aims at recognizing audio signals, including speech commands or sound events. However, current audio classifiers are susceptible to perturbations and adversarial attacks. In addition, real-world audio classification tasks often suffer from limited labeled data. To help bridge these gaps, previous work developed neuro-inspired convolutional neural networks (CNNs) with sparse coding via the Locally Competitive Algorithm (LCA) in the first layer (i.e., LCANets) for computer vision. LCANets learn in a combination of supervised and unsupervised learning, reducing dependency on labeled samples. Motivated by the fact that auditory cortex is also sparse, we extend LCANets to audio recognition tasks and introduce LCANets++, which are CNNs that perform sparse coding in multiple layers via LCA. We demonstrate that LCANets++ are more robust than standard CNNs and LCANets against perturbations, e.g., background noise, as well as black-box and white-box attacks, e.g., evasion and fast gradient sign (FGSM) attacks.
    Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication. (arXiv:2306.10466v2 [cs.LG] UPDATED)
    Graphs are omnipresent and GNNs are a powerful family of neural networks for learning over graphs. Despite their popularity, scaling GNNs either by deepening or widening suffers from prevalent issues of unhealthy gradients, over-smoothening, information squashing, which often lead to sub-standard performance. In this work, we are interested in exploring a principled way to scale GNNs capacity without deepening or widening, which can improve its performance across multiple small and large graphs. Motivated by the recent intriguing phenomenon of model soups, which suggest that fine-tuned weights of multiple large-language pre-trained models can be merged to a better minima, we argue to exploit the fundamentals of model soups to mitigate the aforementioned issues of memory bottleneck and trainability during GNNs scaling. More specifically, we propose not to deepen or widen current GNNs, but instead present a data-centric perspective of model soups tailored for GNNs, i.e., to build powerful GNNs. By dividing giant graph data, we build multiple independently and parallelly trained weaker GNNs (soup ingredient) without any intermediate communication, and combine their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance. Compared to concurrent distributed GNN training works such as Jiong et. al. 2023, we train each soup ingredient by sampling different subgraphs per epoch and their respective sub-models are merged only after being fully trained (rather than intermediately so). Moreover, we provide a wide variety of model soup preparation techniques by leveraging state-of-the-art graph sampling and graph partitioning approaches that can handle large graphs. Codes are available at: \url{https://github.com/VITA-Group/graph_ladling}.
    Conditional expectation using compactification operators. (arXiv:2306.10592v3 [stat.ML] UPDATED)
    The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that allow numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.
    A Co-training Approach for Noisy Time Series Learning. (arXiv:2308.12551v1 [cs.LG])
    In this work, we focus on robust time series representation learning. Our assumption is that real-world time series is noisy and complementary information from different views of the same time series plays an important role while analyzing noisy input. Based on this, we create two views for the input time series through two different encoders. We conduct co-training based contrastive learning iteratively to learn the encoders. Our experiments demonstrate that this co-training approach leads to a significant improvement in performance. Especially, by leveraging the complementary information from different views, our proposed TS-CoT method can mitigate the impact of data noise and corruption. Empirical evaluations on four time series benchmarks in unsupervised and semi-supervised settings reveal that TS-CoT outperforms existing methods. Furthermore, the representations learned by TS-CoT can transfer well to downstream tasks through fine-tuning.
    Try with Simpler -- An Evaluation of Improved Principal Component Analysis in Log-based Anomaly Detection. (arXiv:2308.12612v1 [cs.LG])
    The rapid growth of deep learning (DL) has spurred interest in enhancing log-based anomaly detection. This approach aims to extract meaning from log events (log message templates) and develop advanced DL models for anomaly detection. However, these DL methods face challenges like heavy reliance on training data, labels, and computational resources due to model complexity. In contrast, traditional machine learning and data mining techniques are less data-dependent and more efficient but less effective than DL. To make log-based anomaly detection more practical, the goal is to enhance traditional techniques to match DL's effectiveness. Previous research in a different domain (linking questions on Stack Overflow) suggests that optimized traditional techniques can rival state-of-the-art DL methods. Drawing inspiration from this concept, we conducted an empirical study. We optimized the unsupervised PCA (Principal Component Analysis), a traditional technique, by incorporating lightweight semantic-based log representation. This addresses the issue of unseen log events in training data, enhancing log representation. Our study compared seven log-based anomaly detection methods, including four DL-based, two traditional, and the optimized PCA technique, using public and industrial datasets. Results indicate that the optimized unsupervised PCA technique achieves similar effectiveness to advanced supervised/semi-supervised DL methods while being more stable with limited training data and resource-efficient. This demonstrates the adaptability and strength of traditional techniques through small yet impactful adaptations.
    An Intentional Forgetting-Driven Self-Healing Method For Deep Reinforcement Learning Systems. (arXiv:2308.12445v1 [cs.LG])
    Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven systems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.
    Convergence of the Backward Deep BSDE Method with Applications to Optimal Stopping Problems. (arXiv:2210.04118v3 [math.PR] UPDATED)
    The optimal stopping problem is one of the core problems in financial markets, with broad applications such as pricing American and Bermudan options. The deep BSDE method [Han, Jentzen and E, PNAS, 115(34):8505-8510, 2018] has shown great power in solving high-dimensional forward-backward stochastic differential equations (FBSDEs), and inspired many applications. However, the method solves backward stochastic differential equations (BSDEs) in a forward manner, which can not be used for optimal stopping problems that in general require running BSDE backwardly. To overcome this difficulty, a recent paper [Wang, Chen, Sudjianto, Liu and Shen, arXiv:1807.06622, 2018] proposed the backward deep BSDE method to solve the optimal stopping problem. In this paper, we provide the rigorous theory for the backward deep BSDE method. Specifically, 1. We derive the a posteriori error estimation, i.e., the error of the numerical solution can be bounded by the training loss function; and; 2. We give an upper bound of the loss function, which can be sufficiently small subject to universal approximations. We give two numerical examples, which present consistent performance with the proved theory.
    A Continual Learning Approach for Cross-Domain White Blood Cell Classification. (arXiv:2308.12679v1 [cs.CV])
    Accurate classification of white blood cells in peripheral blood is essential for diagnosing hematological diseases. Due to constantly evolving clinical settings, data sources, and disease classifications, it is necessary to update machine learning classification models regularly for practical real-world use. Such models significantly benefit from sequentially learning from incoming data streams without forgetting previously acquired knowledge. However, models can suffer from catastrophic forgetting, causing a drop in performance on previous tasks when fine-tuned on new data. Here, we propose a rehearsal-based continual learning approach for class incremental and domain incremental scenarios in white blood cell classification. To choose representative samples from previous tasks, we employ exemplar set selection based on the model's predictions. This involves selecting the most confident samples and the most challenging samples identified through uncertainty estimation of the model. We thoroughly evaluated our proposed approach on three white blood cell classification datasets that differ in color, resolution, and class composition, including scenarios where new domains or new classes are introduced to the model with every task. We also test a long class incremental experiment with both new domains and new classes. Our results demonstrate that our approach outperforms established baselines in continual learning, including existing iCaRL and EWC methods for classifying white blood cells in cross-domain environments.
    Geodesic Mode Connectivity. (arXiv:2308.12666v1 [cs.LG])
    Mode connectivity is a phenomenon where trained models are connected by a path of low loss. We reframe this in the context of Information Geometry, where neural networks are studied as spaces of parameterized distributions with curved geometry. We hypothesize that shortest paths in these spaces, known as geodesics, correspond to mode-connecting paths in the loss landscape. We propose an algorithm to approximate geodesics and demonstrate that they achieve mode connectivity.
    PFL-GAN: When Client Heterogeneity Meets Generative Models in Personalized Federated Learning. (arXiv:2308.12454v1 [cs.LG])
    Recent advances of generative learning models are accompanied by the growing interest in federated learning (FL) based on generative adversarial network (GAN) models. In the context of FL, GAN can capture the underlying client data structure, and regenerate samples resembling the original data distribution without compromising the private raw data. Although most existing GAN-based FL works focus on training a global model, Personalized FL (PFL) sometimes can be more effective in view of client data heterogeneity in terms of distinct data sample distributions, feature spaces, and labels. To cope with client heterogeneity in GAN-based FL, we propose a novel GAN sharing and aggregation strategy for PFL. The proposed PFL-GAN addresses the client heterogeneity in different scenarios. More specially, we first learn the similarity among clients and then develop an weighted collaborative data aggregation. The empirical results through the rigorous experimentation on several well-known datasets demonstrate the effectiveness of PFL-GAN.
    Variational Information Pursuit with Large Language and Multimodal Models for Interpretable Predictions. (arXiv:2308.12562v1 [cs.LG])
    Variational Information Pursuit (V-IP) is a framework for making interpretable predictions by design by sequentially selecting a short chain of task-relevant, user-defined and interpretable queries about the data that are most informative for the task. While this allows for built-in interpretability in predictive models, applying V-IP to any task requires data samples with dense concept-labeling by domain experts, limiting the application of V-IP to small-scale tasks where manual data annotation is feasible. In this work, we extend the V-IP framework with Foundational Models (FMs) to address this limitation. More specifically, we use a two-step process, by first leveraging Large Language Models (LLMs) to generate a sufficiently large candidate set of task-relevant interpretable concepts, then using Large Multimodal Models to annotate each data sample by semantic similarity with each concept in the generated concept set. While other interpretable-by-design frameworks such as Concept Bottleneck Models (CBMs) require an additional step of removing repetitive and non-discriminative concepts to have good interpretability and test performance, we mathematically and empirically justify that, with a sufficiently informative and task-relevant query (concept) set, the proposed FM+V-IP method does not require any type of concept filtering. In addition, we show that FM+V-IP with LLM generated concepts can achieve better test performance than V-IP with human annotated concepts, demonstrating the effectiveness of LLMs at generating efficient query sets. Finally, when compared to other interpretable-by-design frameworks such as CBMs, FM+V-IP can achieve competitive test performance using fewer number of concepts/queries in both cases with filtered or unfiltered concept sets.
    MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive Learning with Omics-Inference Modeling. (arXiv:2308.09725v2 [q-bio.GN] UPDATED)
    Precision medicine fundamentally aims to establish causality between dysregulated biochemical mechanisms and cancer subtypes. Omics-based cancer subtyping has emerged as a revolutionary approach, as different level of omics records the biochemical products of multistep processes in cancers. This paper focuses on fully exploiting the potential of multi-omics data to improve cancer subtyping outcomes, and hence developed MoCLIM, a representation learning framework. MoCLIM independently extracts the informative features from distinct omics modalities. Using a unified representation informed by contrastive learning of different omics modalities, we can well-cluster the subtypes, given cancer, into a lower latent space. This contrast can be interpreted as a projection of inter-omics inference observed in biological networks. Experimental results on six cancer datasets demonstrate that our approach significantly improves data fit and subtyping performance in fewer high-dimensional cancer instances. Moreover, our framework incorporates various medical evaluations as the final component, providing high interpretability in medical analysis.
    Expectation-Complete Graph Representations with Homomorphisms. (arXiv:2306.05838v2 [cs.LG] UPDATED)
    We investigate novel random graph embeddings that can be computed in expected polynomial time and that are able to distinguish all non-isomorphic graphs in expectation. Previous graph embeddings have limited expressiveness and either cannot distinguish all graphs or cannot be computed efficiently for every graph. To be able to approximate arbitrary functions on graphs, we are interested in efficient alternatives that become arbitrarily expressive with increasing resources. Our approach is based on Lov\'asz' characterisation of graph isomorphism through an infinite dimensional vector of homomorphism counts. Our empirical evaluation shows competitive results on several benchmark graph learning tasks.
    FedSoL: Bridging Global Alignment and Local Generality in Federated Learning. (arXiv:2308.12532v1 [cs.LG])
    Federated Learning (FL) aggregates locally trained models from individual clients to construct a global model. While FL enables learning a model with data privacy, it often suffers from significant performance degradation when client data distributions are heterogeneous. Many previous FL algorithms have addressed this issue by introducing various proximal restrictions. These restrictions aim to encourage global alignment by constraining the deviation of local learning from the global objective. However, they inherently limit local learning by interfering with the original local objectives. Recently, an alternative approach has emerged to improve local learning generality. By obtaining local models within a smooth loss landscape, this approach mitigates conflicts among different local objectives of the clients. Yet, it does not ensure stable global alignment, as local learning does not take the global objective into account. In this study, we propose Federated Stability on Learning (FedSoL), which combines both the concepts of global alignment and local generality. In FedSoL, the local learning seeks a parameter region robust against proximal perturbations. This strategy introduces an implicit proximal restriction effect in local learning while maintaining the original local objective for parameter update. Our experiments show that FedSoL consistently achieves state-of-the-art performance on various setups.
    Unified Data Management and Comprehensive Performance Evaluation for Urban Spatial-Temporal Prediction [Experiment, Analysis & Benchmark]. (arXiv:2308.12899v1 [cs.LG])
    The field of urban spatial-temporal prediction is advancing rapidly with the development of deep learning techniques and the availability of large-scale datasets. However, challenges persist in accessing and utilizing diverse urban spatial-temporal datasets from different sources and stored in different formats, as well as determining effective model structures and components with the proliferation of deep learning models. This work addresses these challenges and provides three significant contributions. Firstly, we introduce "atomic files", a unified storage format designed for urban spatial-temporal big data, and validate its effectiveness on 40 diverse datasets, simplifying data management. Secondly, we present a comprehensive overview of technological advances in urban spatial-temporal prediction models, guiding the development of robust models. Thirdly, we conduct extensive experiments using diverse models and datasets, establishing a performance leaderboard and identifying promising research directions. Overall, this work effectively manages urban spatial-temporal data, guides future efforts, and facilitates the development of accurate and efficient urban spatial-temporal prediction models. It can potentially make long-term contributions to urban spatial-temporal data management and prediction, ultimately leading to improved urban living standards.
    Disentanglement Learning via Topology. (arXiv:2308.12696v1 [cs.LG])
    We propose TopDis (Topological Disentanglement), a method for learning disentangled representations via adding multi-scale topological loss term. Disentanglement is a crucial property of data representations substantial for the explainability and robustness of deep learning models and a step towards high-level cognition. The state-of-the-art method based on VAE minimizes the total correlation of the joint distribution of latent variables. We take a different perspective on disentanglement by analyzing topological properties of data manifolds. In particular, we optimize the topological similarity for data manifolds traversals. To the best of our knowledge, our paper is the first one to propose a differentiable topological loss for disentanglement. Our experiments have shown that the proposed topological loss improves disentanglement scores such as MIG, FactorVAE score, SAP score and DCI disentanglement score with respect to state-of-the-art results. Our method works in an unsupervised manner, permitting to apply it for problems without labeled factors of variation. Additionally, we show how to use the proposed topological loss to find disentangled directions in a trained GAN.
    Feature Unlearning for Pre-trained GANs and VAEs. (arXiv:2303.05699v2 [cs.CV] UPDATED)
    We tackle the problem of feature unlearning from a pre-trained image generative model: GANs and VAEs. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pre-trained generative models. As the target feature is only presented in a local region of an image, unlearning the entire image from the pre-trained model may result in losing other details in the remaining region of the image. To specify which features to unlearn, we collect randomly generated images that contain the target features. We then identify a latent representation corresponding to the target feature and then use the representation to fine-tune the pre-trained model. Through experiments on MNIST and CelebA datasets, we show that target features are successfully removed while keeping the fidelity of the original models. Further experiments with an adversarial attack show that the unlearned model is more robust under the presence of malicious parties.
    Multivariate Time-Series Anomaly Detection with Contaminated Data: Application to Physiological Signals. (arXiv:2308.12563v1 [cs.LG])
    Mainstream unsupervised anomaly detection algorithms often excel in academic datasets, yet their real-world performance is restricted due to the controlled experimental conditions involving clean training data. Addressing the challenge of training with noise, a prevalent issue in practical anomaly detection, is frequently overlooked. In a pioneering endeavor, this study delves into the realm of label-level noise within sensory time-series anomaly detection (TSAD). This paper presents a novel and practical end-to-end unsupervised TSAD when the training data are contaminated with anomalies. The introduced approach, called TSAD-C, is devoid of access to abnormality labels during the training phase. TSAD-C encompasses three modules: a Decontaminator to rectify the abnormalities (aka noise) present in the training data, a Variable Dependency Modeling module to capture both long-term intra- and inter-variable dependencies within the decontaminated data that can be considered as a surrogate of the pure normal data, and an Anomaly Scoring module to detect anomalies. Our extensive experiments conducted on three widely used physiological datasets conclusively demonstrate that our approach surpasses existing methodologies, thus establishing a new state-of-the-art performance in the field.
    Evaluating the Vulnerabilities in ML systems in terms of adversarial attacks. (arXiv:2308.12918v1 [cs.LG])
    There have been recent adversarial attacks that are difficult to find. These new adversarial attacks methods may pose challenges to current deep learning cyber defense systems and could influence the future defense of cyberattacks. The authors focus on this domain in this research paper. They explore the consequences of vulnerabilities in AI systems. This includes discussing how they might arise, differences between randomized and adversarial examples and also potential ethical implications of vulnerabilities. Moreover, it is important to train the AI systems appropriately when they are in testing phase and getting them ready for broader use.
    Actuator Trajectory Planning for UAVs with Overhead Manipulator using Reinforcement Learning. (arXiv:2308.12843v1 [cs.RO])
    In this paper, we investigate the operation of an aerial manipulator system, namely an Unmanned Aerial Vehicle (UAV) equipped with a controllable arm with two degrees of freedom to carry out actuation tasks on the fly. Our solution is based on employing a Q-learning method to control the trajectory of the tip of the arm, also called \textit{end-effector}. More specifically, we develop a motion planning model based on Time To Collision (TTC), which enables a quadrotor UAV to navigate around obstacles while ensuring the manipulator's reachability. Additionally, we utilize a model-based Q-learning model to independently track and control the desired trajectory of the manipulator's end-effector, given an arbitrary baseline trajectory for the UAV platform. Such a combination enables a variety of actuation tasks such as high-altitude welding, structural monitoring and repair, battery replacement, gutter cleaning, sky scrapper cleaning, and power line maintenance in hard-to-reach and risky environments while retaining compatibility with flight control firmware. Our RL-based control mechanism results in a robust control strategy that can handle uncertainties in the motion of the UAV, offering promising performance. Specifically, our method achieves 92\% accuracy in terms of average displacement error (i.e. the mean distance between the target and obtained trajectory points) using Q-learning with 15,000 episodes
    Breaking the Communication-Privacy-Accuracy Tradeoff with $f$-Differential Privacy. (arXiv:2302.09624v2 [cs.CR] UPDATED)
    We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. The commonly adopted compression schemes introduce information loss into local data while improving communication efficiency, and it remains an open problem whether such discrete-valued mechanisms provide any privacy protection. In this paper, we study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP). More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms, including the binomial noise and the binomial mechanisms that are proposed for privacy preservation, and the sign-based methods that are proposed for data compression, in closed-form expressions. We further investigate the amplification in privacy by sparsification and propose a ternary stochastic compressor. By leveraging compression for privacy amplification, we improve the existing methods by removing the dependency of accuracy (in terms of mean square error) on communication cost in the popular use case of distributed mean estimation, therefore breaking the three-way tradeoff between privacy, communication, and accuracy. Finally, we discuss the Byzantine resilience of the proposed mechanism and its application in federated learning.
    FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning. (arXiv:2308.12305v1 [cs.LG])
    Recently, foundation models have exhibited remarkable advancements in multi-modal learning. These models, equipped with millions (or billions) of parameters, typically require a substantial amount of data for finetuning. However, collecting and centralizing training data from diverse sectors becomes challenging due to distinct privacy regulations. Federated Learning (FL) emerges as a promising solution, enabling multiple clients to collaboratively train neural networks without centralizing their local data. To alleviate client computation burdens and communication overheads, previous works have adapted Parameter-efficient Finetuning (PEFT) methods for FL. Hereby, only a small fraction of the model parameters are optimized and communicated during federated communications. Nevertheless, most previous works have focused on a single modality and neglected one common phenomenon, i.e., the presence of data heterogeneity across the clients. Therefore, in this work, we propose a finetuning framework tailored to heterogeneous multi-modal FL, called Federated Dual-Aadapter Teacher (FedDAT). Specifically, our approach leverages a Dual-Adapter Teacher (DAT) to address data heterogeneity by regularizing the client local updates and applying Mutual Knowledge Distillation (MKD) for an efficient knowledge transfer. FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks. To demonstrate its effectiveness, we conduct extensive experiments on four multi-modality FL benchmarks with different types of data heterogeneity, where FedDAT substantially outperforms the existing centralized PEFT methods adapted for FL.
    Short Run Transit Route Planning Decision Support System Using a Deep Learning-Based Weighted Graph. (arXiv:2308.12828v1 [cs.AI])
    Public transport routing plays a crucial role in transit network design, ensuring a satisfactory level of service for passengers. However, current routing solutions rely on traditional operational research heuristics, which can be time-consuming to implement and lack the ability to provide quick solutions. Here, we propose a novel deep learning-based methodology for a decision support system that enables public transport (PT) planners to identify short-term route improvements rapidly. By seamlessly adjusting specific sections of routes between two stops during specific times of the day, our method effectively reduces times and enhances PT services. Leveraging diverse data sources such as GTFS and smart card data, we extract features and model the transportation network as a directed graph. Using self-supervision, we train a deep learning model for predicting lateness values for road segments. These lateness values are then utilized as edge weights in the transportation graph, enabling efficient path searching. Through evaluating the method on Tel Aviv, we are able to reduce times on more than 9\% of the routes. The improved routes included both intraurban and suburban routes showcasing a fact highlighting the model's versatility. The findings emphasize the potential of our data-driven decision support system to enhance public transport and city logistics, promoting greater efficiency and reliability in PT services.
    Efficient Sensor Placement from Regression with Sparse Gaussian Processes in Continuous and Discrete Spaces. (arXiv:2303.00028v4 [cs.RO] UPDATED)
    The sensor placement problem is a common problem that arises when monitoring correlated phenomena, such as temperature and precipitation. Existing approaches to this problem typically use discrete optimization methods, which are computationally expensive and cannot scale to large problems. We address the sensor placement problem in correlated environments by reducing it to a regression problem that can be efficiently solved using sparse Gaussian processes (SGPs). Our approach can handle both discrete sensor placement problems-where sensors are limited to a subset of a given set of locations-and continuous sensor placement problems-where sensors can be placed anywhere in a bounded continuous region. We further generalize our approach to handle sensors with a non-point field of view and integrated observations. Our experimental results on three real-world datasets show that our approach generates sensor placements that result in reconstruction quality that is consistently on par or better than the prior state-of-the-art approach while being significantly faster. Our computationally efficient approach enables both large-scale sensor placement and fast robotic sensor placement for informative path planning algorithms.
    TAI-GAN: Temporally and Anatomically Informed GAN for early-to-late frame conversion in dynamic cardiac PET motion correction. (arXiv:2308.12443v1 [eess.IV])
    The rapid tracer kinetics of rubidium-82 ($^{82}$Rb) and high variation of cross-frame distribution in dynamic cardiac positron emission tomography (PET) raise significant challenges for inter-frame motion correction, particularly for the early frames where conventional intensity-based image registration techniques are not applicable. Alternatively, a promising approach utilizes generative methods to handle the tracer distribution changes to assist existing registration methods. To improve frame-wise registration and parametric quantification, we propose a Temporally and Anatomically Informed Generative Adversarial Network (TAI-GAN) to transform the early frames into the late reference frame using an all-to-one mapping. Specifically, a feature-wise linear modulation layer encodes channel-wise parameters generated from temporal tracer kinetics information, and rough cardiac segmentations with local shifts serve as the anatomical information. We validated our proposed method on a clinical $^{82}$Rb PET dataset and found that our TAI-GAN can produce converted early frames with high image quality, comparable to the real reference frames. After TAI-GAN conversion, motion estimation accuracy and clinical myocardial blood flow (MBF) quantification were improved compared to using the original frames. Our code is published at https://github.com/gxq1998/TAI-GAN.
    BallGAN: 3D-aware Image Synthesis with a Spherical Background. (arXiv:2301.09091v3 [cs.CV] UPDATED)
    3D-aware GANs aim to synthesize realistic 3D scenes such that they can be rendered in arbitrary perspectives to produce images. Although previous methods produce realistic images, they suffer from unstable training or degenerate solutions where the 3D geometry is unnatural. We hypothesize that the 3D geometry is underdetermined due to the insufficient constraint, i.e., being classified as real image to the discriminator is not enough. To solve this problem, we propose to approximate the background as a spherical surface and represent a scene as a union of the foreground placed in the sphere and the thin spherical background. It reduces the degree of freedom in the background field. Accordingly, we modify the volume rendering equation and incorporate dedicated constraints to design a novel 3D-aware GAN framework named BallGAN. BallGAN has multiple advantages as follows. 1) It produces more reasonable 3D geometry; the images of a scene across different viewpoints have better photometric consistency and fidelity than the state-of-the-art methods. 2) The training becomes much more stable. 3) The foreground can be separately rendered on top of different arbitrary backgrounds.
    Min-Max Optimization under Delays. (arXiv:2307.06886v2 [cs.LG] UPDATED)
    Delays and asynchrony are inevitable in large-scale machine-learning problems where communication plays a key role. As such, several works have extensively analyzed stochastic optimization with delayed gradients. However, as far as we are aware, no analogous theory is available for min-max optimization, a topic that has gained recent popularity due to applications in adversarial robustness, game theory, and reinforcement learning. Motivated by this gap, we examine the performance of standard min-max optimization algorithms with delayed gradient updates. First, we show (empirically) that even small delays can cause prominent algorithms like Extra-gradient (\texttt{EG}) to diverge on simple instances for which \texttt{EG} guarantees convergence in the absence of delays. Our empirical study thus suggests the need for a careful analysis of delayed versions of min-max optimization algorithms. Accordingly, under suitable technical assumptions, we prove that Gradient Descent-Ascent (\texttt{GDA}) and \texttt{EG} with delayed updates continue to guarantee convergence to saddle points for convex-concave and strongly convex-strongly concave settings. Our complexity bounds reveal, in a transparent manner, the slow-down in convergence caused by delays.
    NeuralClothSim: Neural Deformation Fields Meet the Kirchhoff-Love Thin Shell Theory. (arXiv:2308.12970v1 [cs.GR])
    Cloth simulation is an extensively studied problem, with a plethora of solutions available in computer graphics literature. Existing cloth simulators produce realistic cloth deformations that obey different types of boundary conditions. Nevertheless, their operational principle remains limited in several ways: They operate on explicit surface representations with a fixed spatial resolution, perform a series of discretised updates (which bounds their temporal resolution), and require comparably large amounts of storage. Moreover, back-propagating gradients through the existing solvers is often not straightforward, which poses additional challenges when integrating them into modern neural architectures. In response to the limitations mentioned above, this paper takes a fundamentally different perspective on physically-plausible cloth simulation and re-thinks this long-standing problem: We propose NeuralClothSim, i.e., a new cloth simulation approach using thin shells, in which surface evolution is encoded in neural network weights. Our memory-efficient and differentiable solver operates on a new continuous coordinate-based representation of dynamic surfaces, i.e., neural deformation fields (NDFs); it supervises NDF evolution with the rules of the non-linear Kirchhoff-Love shell theory. NDFs are adaptive in the sense that they 1) allocate their capacity to the deformation details as the latter arise during the cloth evolution and 2) allow surface state queries at arbitrary spatial and temporal resolutions without retraining. We show how to train our NeuralClothSim solver while imposing hard boundary conditions and demonstrate multiple applications, such as material interpolation and simulation editing. The experimental results highlight the effectiveness of our formulation and its potential impact.
    False Information, Bots and Malicious Campaigns: Demystifying Elements of Social Media Manipulations. (arXiv:2308.12497v1 [cs.SI])
    The rapid spread of false information and persistent manipulation attacks on online social networks (OSNs), often for political, ideological, or financial gain, has affected the openness of OSNs. While researchers from various disciplines have investigated different manipulation-triggering elements of OSNs (such as understanding information diffusion on OSNs or detecting automated behavior of accounts), these works have not been consolidated to present a comprehensive overview of the interconnections among these elements. Notably, user psychology, the prevalence of bots, and their tactics in relation to false information detection have been overlooked in previous research. To address this research gap, this paper synthesizes insights from various disciplines to provide a comprehensive analysis of the manipulation landscape. By integrating the primary elements of social media manipulation (SMM), including false information, bots, and malicious campaigns, we extensively examine each SMM element. Through a systematic investigation of prior research, we identify commonalities, highlight existing gaps, and extract valuable insights in the field. Our findings underscore the urgent need for interdisciplinary research to effectively combat social media manipulations, and our systematization can guide future research efforts and assist OSN providers in ensuring the safety and integrity of their platforms.
    Continuous Reinforcement Learning-based Dynamic Difficulty Adjustment in a Visual Working Memory Game. (arXiv:2308.12726v1 [cs.HC])
    Dynamic Difficulty Adjustment (DDA) is a viable approach to enhance a player's experience in video games. Recently, Reinforcement Learning (RL) methods have been employed for DDA in non-competitive games; nevertheless, they rely solely on discrete state-action space with a small search space. In this paper, we propose a continuous RL-based DDA methodology for a visual working memory (VWM) game to handle the complex search space for the difficulty of memorization. The proposed RL-based DDA tailors game difficulty based on the player's score and game difficulty in the last trial. We defined a continuous metric for the difficulty of memorization. Then, we consider the task difficulty and the vector of difficulty-score as the RL's action and state, respectively. We evaluated the proposed method through a within-subject experiment involving 52 subjects. The proposed approach was compared with two rule-based difficulty adjustment methods in terms of player's score and game experience measured by a questionnaire. The proposed RL-based approach resulted in a significantly better game experience in terms of competence, tension, and negative and positive affect. Players also achieved higher scores and win rates. Furthermore, the proposed RL-based DDA led to a significantly less decline in the score in a 20-trial session.
    Machine learning in parameter estimation of nonlinear systems. (arXiv:2308.12393v1 [cs.LG])
    Accurately estimating parameters in complex nonlinear systems is crucial across scientific and engineering fields. We present a novel approach for parameter estimation using a neural network with the Huber loss function. This method taps into deep learning's abilities to uncover parameters governing intricate behaviors in nonlinear equations. We validate our approach using synthetic data and predefined functions that model system dynamics. By training the neural network with noisy time series data, it fine-tunes the Huber loss function to converge to accurate parameters. We apply our method to damped oscillators, Van der Pol oscillators, Lotka-Volterra systems, and Lorenz systems under multiplicative noise. The trained neural network accurately estimates parameters, evident from closely matching latent dynamics. Comparing true and estimated trajectories visually reinforces our method's precision and robustness. Our study underscores the Huber loss-guided neural network as a versatile tool for parameter estimation, effectively uncovering complex relationships in nonlinear systems. The method navigates noise and uncertainty adeptly, showcasing its adaptability to real-world challenges.
    Exploiting Time-Frequency Conformers for Music Audio Enhancement. (arXiv:2308.12599v1 [cs.SD])
    With the proliferation of video platforms on the internet, recording musical performances by mobile devices has become commonplace. However, these recordings often suffer from degradation such as noise and reverberation, which negatively impact the listening experience. Consequently, the necessity for music audio enhancement (referred to as music enhancement from this point onward), involving the transformation of degraded audio recordings into pristine high-quality music, has surged to augment the auditory experience. To address this issue, we propose a music enhancement system based on the Conformer architecture that has demonstrated outstanding performance in speech enhancement tasks. Our approach explores the attention mechanisms of the Conformer and examines their performance to discover the best approach for the music enhancement task. Our experimental results show that our proposed model achieves state-of-the-art performance on single-stem music enhancement. Furthermore, our system can perform general music enhancement with multi-track mixtures, which has not been examined in previous work.
    LORD: Leveraging Open-Set Recognition with Unknown Data. (arXiv:2308.12584v1 [cs.CV])
    Handling entirely unknown data is a challenge for any deployed classifier. Classification models are typically trained on a static pre-defined dataset and are kept in the dark for the open unassigned feature space. As a result, they struggle to deal with out-of-distribution data during inference. Addressing this task on the class-level is termed open-set recognition (OSR). However, most OSR methods are inherently limited, as they train closed-set classifiers and only adapt the downstream predictions to OSR. This work presents LORD, a framework to Leverage Open-set Recognition by exploiting unknown Data. LORD explicitly models open space during classifier training and provides a systematic evaluation for such approaches. We identify three model-agnostic training strategies that exploit background data and applied them to well-established classifiers. Due to LORD's extensive evaluation protocol, we consistently demonstrate improved recognition of unknown data. The benchmarks facilitate in-depth analysis across various requirement levels. To mitigate dependency on extensive and costly background datasets, we explore mixup as an off-the-shelf data generation technique. Our experiments highlight mixup's effectiveness as a substitute for background datasets. Lightweight constraints on mixup synthesis further improve OSR performance.
    The GENEA Challenge 2023: A large scale evaluation of gesture generation models in monadic and dyadic settings. (arXiv:2308.12646v1 [cs.HC])
    This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven gesture-generation systems using the same speech and motion dataset, followed by a joint evaluation. This year's challenge provided data on both sides of a dyadic interaction, allowing teams to generate full-body motion for an agent given its speech (text and audio) and the speech and motion of the interlocutor. We evaluated 12 submissions and 2 baselines together with held-out motion-capture data in several large-scale user studies. The studies focused on three aspects: 1) the human-likeness of the motion, 2) the appropriateness of the motion for the agent's own speech whilst controlling for the human-likeness of the motion, and 3) the appropriateness of the motion for the behaviour of the interlocutor in the interaction, using a setup that controls for both the human-likeness of the motion and the agent's own speech. We found a large span in human-likeness between challenge submissions, with a few systems rated close to human mocap. Appropriateness seems far from being solved, with most submissions performing in a narrow range slightly above chance, far behind natural motion. The effect of the interlocutor is even more subtle, with submitted systems at best performing barely above chance. Interestingly, a dyadic system being highly appropriate for agent speech does not necessarily imply high appropriateness for the interlocutor. Additional material is available via the project website at https://svito-zar.github.io/GENEAchallenge2023/ .
    IP-UNet: Intensity Projection UNet Architecture for 3D Medical Volume Segmentation. (arXiv:2308.12761v1 [eess.IV])
    CNNs have been widely applied for medical image analysis. However, limited memory capacity is one of the most common drawbacks of processing high-resolution 3D volumetric data. 3D volumes are usually cropped or downsized first before processing, which can result in a loss of resolution, increase class imbalance, and affect the performance of the segmentation algorithms. In this paper, we propose an end-to-end deep learning approach called IP-UNet. IP-UNet is a UNet-based model that performs multi-class segmentation on Intensity Projection (IP) of 3D volumetric data instead of the memory-consuming 3D volumes. IP-UNet uses limited memory capability for training without losing the original 3D image resolution. We compare the performance of three models in terms of segmentation accuracy and computational cost: 1) Slice-by-slice 2D segmentation of the CT scan images using a conventional 2D UNet model. 2) IP-UNet that operates on data obtained by merging the extracted Maximum Intensity Projection (MIP), Closest Vessel Projection (CVP), and Average Intensity Projection (AvgIP) representations of the source 3D volumes, then applying the UNet model on the output IP images. 3) 3D-UNet model directly reads the 3D volumes constructed from a series of CT scan images and outputs the 3D volume of the predicted segmentation. We test the performance of these methods on 3D volumetric images for automatic breast calcification detection. Experimental results show that IP-Unet can achieve similar segmentation accuracy with 3D-Unet but with much better performance. It reduces the training time by 70\% and memory consumption by 92\%.
    An Efficient Data Analysis Method for Big Data using Multiple-Model Linear Regression. (arXiv:2308.12691v1 [cs.LG])
    This paper introduces a new data analysis method for big data using a newly defined regression model named multiple model linear regression(MMLR), which separates input datasets into subsets and construct local linear regression models of them. The proposed data analysis method is shown to be more efficient and flexible than other regression based methods. This paper also proposes an approximate algorithm to construct MMLR models based on $(\epsilon,\delta)$-estimator, and gives mathematical proofs of the correctness and efficiency of MMLR algorithm, of which the time complexity is linear with respect to the size of input datasets. This paper also empirically implements the method on both synthetic and real-world datasets, the algorithm shows to have comparable performance to existing regression methods in many cases, while it takes almost the shortest time to provide a high prediction accuracy.
    On the Consistency of Average Embeddings for Item Recommendation. (arXiv:2308.12767v1 [cs.IR])
    A prevalent practice in recommender systems consists of averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.
    FIESTA: Autoencoders for accurate fiber segmentation in tractography. (arXiv:2212.00143v3 [cs.CV] UPDATED)
    White matter bundle segmentation is a cornerstone of modern tractography to study the brain's structural connectivity in domains such as neurological disorders, neurosurgery, and aging. In this study, we present FIESTA (FIbEr Segmentation in Tractography using Autoencoders), a reliable and robust, fully automated, and easily semi-automatically calibrated pipeline based on deep autoencoders that can dissect and fully populate white matter bundles. This pipeline is built upon previous works that demonstrated how autoencoders can be used successfully for streamline filtering, bundle segmentation, and streamline generation in tractography. Our proposed method improves bundle segmentation coverage by recovering hard-to-track bundles with generative sampling through the latent space seeding of the subject bundle and the atlas bundle. A latent space of streamlines is learned using autoencoder-based modeling combined with contrastive learning. Using an atlas of bundles in standard space (MNI), our proposed method segments new tractograms using the autoencoder latent distance between each tractogram streamline and its closest neighbor bundle in the atlas of bundles. Intra-subject bundle reliability is improved by recovering hard-to-track streamlines, using the autoencoder to generate new streamlines that increase the spatial coverage of each bundle while remaining anatomically correct. Results show that our method is more reliable than state-of-the-art automated virtual dissection methods such as RecoBundles, RecoBundlesX, TractSeg, White Matter Analysis and XTRACT. Our framework allows for the transition from one anatomical bundle definition to another with marginal calibration efforts. Overall, these results show that our framework improves the practicality and usability of current state-of-the-art bundle segmentation framework.
    Augmenting medical image classifiers with synthetic data from latent diffusion models. (arXiv:2308.12453v1 [cs.CV])
    While hundreds of artificial intelligence (AI) algorithms are now approved or cleared by the US Food and Drugs Administration (FDA), many studies have shown inconsistent generalization or latent bias, particularly for underrepresented populations. Some have proposed that generative AI could reduce the need for real data, but its utility in model development remains unclear. Skin disease serves as a useful case study in synthetic image generation due to the diversity of disease appearance, particularly across the protected attribute of skin tone. Here we show that latent diffusion models can scalably generate images of skin disease and that augmenting model training with these data improves performance in data-limited settings. These performance gains saturate at synthetic-to-real image ratios above 10:1 and are substantially smaller than the gains obtained from adding real images. As part of our analysis, we generate and analyze a new dataset of 458,920 synthetic images produced using several generation strategies. Our results suggest that synthetic data could serve as a force-multiplier for model development, but the collection of diverse real-world data remains the most important step to improve medical AI algorithms.
    Fat Shattering, Joint Measurability, and PAC Learnability of POVM Hypothesis Classes. (arXiv:2308.12304v1 [stat.ML])
    We characterize learnability for quantum measurement classes by establishing matching necessary and sufficient conditions for their PAC learnability, along with corresponding sample complexity bounds, in the setting where the learner is given access only to prepared quantum states. We first probe the results from previous works on this setting. We show that the empirical risk defined in previous works and matching the definition in the classical theory fails to satisfy the uniform convergence property enjoyed in the classical setting for some learnable classes. Moreover, we show that VC dimension generalization upper bounds in previous work are frequently infinite, even for finite-dimensional POVM classes. To surmount the failure of the standard ERM to satisfy uniform convergence, we define a new learning rule -- denoised ERM. We show this to be a universal learning rule for POVM and probabilistically observed concept classes, and the condition for it to satisfy uniform convergence is finite fat shattering dimension of the class. We give quantitative sample complexity upper and lower bounds for learnability in terms of finite fat-shattering dimension and a notion of approximate finite partitionability into approximately jointly measurable subsets, which allow for sample reuse. We then show that finite fat shattering dimension implies finite coverability by approximately jointly measurable subsets, leading to our matching conditions. We also show that every measurement class defined on a finite-dimensional Hilbert space is PAC learnable. We illustrate our results on several example POVM classes.
    Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy. (arXiv:2308.12553v1 [cs.LG])
    Common explanations for shortcut learning assume that the shortcut improves prediction under the training distribution but not in the test distribution. Thus, models trained via the typical gradient-based optimization of cross-entropy, which we call default-ERM, utilize the shortcut. However, even when the stable feature determines the label in the training distribution and the shortcut does not provide any additional information, like in perception tasks, default-ERM still exhibits shortcut learning. Why are such solutions preferred when the loss for default-ERM can be driven to zero using the stable feature alone? By studying a linear perception task, we show that default-ERM's preference for maximizing the margin leads to models that depend more on the shortcut than the stable feature, even without overparameterization. This insight suggests that default-ERM's implicit inductive bias towards max-margin is unsuitable for perception tasks. Instead, we develop an inductive bias toward uniform margins and show that this bias guarantees dependence only on the perfect stable feature in the linear perception task. We develop loss functions that encourage uniform-margin solutions, called margin control (MARG-CTRL). MARG-CTRL mitigates shortcut learning on a variety of vision and language tasks, showing that better inductive biases can remove the need for expensive two-stage shortcut-mitigating methods in perception tasks.
    PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts. (arXiv:2306.04528v3 [cs.CL] UPDATED)
    The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.
    Towards Hierarchical Regional Transformer-based Multiple Instance Learning. (arXiv:2308.12634v1 [cs.CV])
    The classification of gigapixel histopathology images with deep multiple instance learning models has become a critical task in digital pathology and precision medicine. In this work, we propose a Transformer-based multiple instance learning approach that replaces the traditional learned attention mechanism with a regional, Vision Transformer inspired self-attention mechanism. We present a method that fuses regional patch information to derive slide-level predictions and show how this regional aggregation can be stacked to hierarchically process features on different distance levels. To increase predictive accuracy, especially for datasets with small, local morphological features, we introduce a method to focus the image processing on high attention regions during inference. Our approach is able to significantly improve performance over the baseline on two histopathology datasets and points towards promising directions for further research.
    Symplectic model reduction of Hamiltonian systems using data-driven quadratic manifolds. (arXiv:2305.15490v2 [math.NA] UPDATED)
    This work presents two novel approaches for the symplectic model reduction of high-dimensional Hamiltonian systems using data-driven quadratic manifolds. Classical symplectic model reduction approaches employ linear symplectic subspaces for representing the high-dimensional system states in a reduced-dimensional coordinate system. While these approximations respect the symplectic nature of Hamiltonian systems, linear basis approximations can suffer from slowly decaying Kolmogorov $N$-width, especially in wave-type problems, which then requires a large basis size. We propose two different model reduction methods based on recently developed quadratic manifolds, each presenting its own advantages and limitations. The addition of quadratic terms to the state approximation, which sits at the heart of the proposed methodologies, enables us to better represent intrinsic low-dimensionality in the problem at hand. Both approaches are effective for issuing predictions in settings well outside the range of their training data while providing more accurate solutions than the linear symplectic reduced-order models.
    Deploying Deep Reinforcement Learning Systems: A Taxonomy of Challenges. (arXiv:2308.12438v1 [cs.LG])
    Deep reinforcement learning (DRL), leveraging Deep Learning (DL) in reinforcement learning, has shown significant potential in achieving human-level autonomy in a wide range of domains, including robotics, computer vision, and computer games. This potential justifies the enthusiasm and growing interest in DRL in both academia and industry. However, the community currently focuses mostly on the development phase of DRL systems, with little attention devoted to DRL deployment. In this paper, we propose an empirical study on Stack Overflow (SO), the most popular Q&A forum for developers, to uncover and understand the challenges practitioners faced when deploying DRL systems. Specifically, we categorized relevant SO posts by deployment platforms: server/cloud, mobile/embedded system, browser, and game engine. After filtering and manual analysis, we examined 357 SO posts about DRL deployment, investigated the current state, and identified the challenges related to deploying DRL systems. Then, we investigate the prevalence and difficulty of these challenges. Results show that the general interest in DRL deployment is growing, confirming the study's relevance and importance. Results also show that DRL deployment is more difficult than other DRL issues. Additionally, we built a taxonomy of 31 unique challenges in deploying DRL to different platforms. On all platforms, RL environment-related challenges are the most popular, and communication-related challenges are the most difficult among practitioners. We hope our study inspires future research and helps the community overcome the most common and difficult challenges practitioners face when deploying DRL systems.
    Inferring gender from name: a large scale performance evaluation study. (arXiv:2308.12381v1 [cs.CL])
    A person's gender is a crucial piece of information when performing research across a wide range of scientific disciplines, such as medicine, sociology, political science, and economics, to name a few. However, in increasing instances, especially given the proliferation of big data, gender information is not readily available. In such cases researchers need to infer gender from readily available information, primarily from persons' names. While inferring gender from name may raise some ethical questions, the lack of viable alternatives means that researchers have to resort to such approaches when the goal justifies the means - in the majority of such studies the goal is to examine patterns and determinants of gender disparities. The necessity of name-to-gender inference has generated an ever-growing domain of algorithmic approaches and software products. These approaches have been used throughout the world in academia, industry, governmental and non-governmental organizations. Nevertheless, the existing approaches have yet to be systematically evaluated and compared, making it challenging to determine the optimal approach for future research. In this work, we conducted a large scale performance evaluation of existing approaches for name-to-gender inference. Analysis are performed using a variety of large annotated datasets of names. We further propose two new hybrid approaches that achieve better performance than any single existing approach.
    A multiobjective continuation method to compute the regularization path of deep neural networks. (arXiv:2308.12044v2 [cs.LG] UPDATED)
    Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.
    Match-And-Deform: Time Series Domain Adaptation through Optimal Transport and Temporal Alignment. (arXiv:2308.12686v1 [cs.LG])
    While large volumes of unlabeled data are usually available, associated labels are often scarce. The unsupervised domain adaptation problem aims at exploiting labels from a source domain to classify data from a related, yet different, target domain. When time series are at stake, new difficulties arise as temporal shifts may appear in addition to the standard feature distribution shift. In this paper, we introduce the Match-And-Deform (MAD) approach that aims at finding correspondences between the source and target time series while allowing temporal distortions. The associated optimization problem simultaneously aligns the series thanks to an optimal transport loss and the time stamps through dynamic time warping. When embedded into a deep neural network, MAD helps learning new representations of time series that both align the domains and maximize the discriminative power of the network. Empirical studies on benchmark datasets and remote sensing data demonstrate that MAD makes meaningful sample-to-sample pairing and time shift estimation, reaching similar or better classification performance than state-of-the-art deep time series domain adaptation strategies.
    Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation. (arXiv:2308.12968v1 [cs.CV])
    Automatic high-quality rendering of anime scenes from complex real-world images is of significant practical value. The challenges of this task lie in the complexity of the scenes, the unique features of anime style, and the lack of high-quality datasets to bridge the domain gap. Despite promising attempts, previous efforts are still incompetent in achieving satisfactory results with consistent semantic preservation, evident stylization, and fine details. In this study, we propose Scenimefy, a novel semi-supervised image-to-image translation framework that addresses these challenges. Our approach guides the learning with structure-consistent pseudo paired data, simplifying the pure unsupervised setting. The pseudo data are derived uniquely from a semantic-constrained StyleGAN leveraging rich model priors like CLIP. We further apply segmentation-guided data selection to obtain high-quality pseudo supervision. A patch-wise contrastive style loss is introduced to improve stylization and fine details. Besides, we contribute a high-resolution anime scene dataset to facilitate future research. Our extensive experiments demonstrate the superiority of our method over state-of-the-art baselines in terms of both perceptual quality and quantitative performance.
    Exact Manifold Gaussian Variational Bayes. (arXiv:2210.14598v3 [stat.ML] UPDATED)
    We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference that implicitly satisfies the positive definite constraint on the variational covariance matrix. Our Exact manifold Gaussian Variational Bayes (EMGVB) provides exact but simple update rules and is straightforward to implement. Due to its black-box nature, EMGVB stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical, econometric, and deep learning models, discussing its performance with respect to baseline methods.
    Low-count Time Series Anomaly Detection. (arXiv:2308.12925v1 [cs.LG])
    Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative of local behaviour). The time series anomaly detection community currently lacks explicit tooling and processes to model and reliably detect anomalies in these settings. We address this gap by introducing a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments. Via a mixture of theoretical and empirical analysis, our work explains how widely-used algorithms struggle with the distribution overlap between normal and anomalous segments. In order to mitigate this shortcoming, we then leverage our findings to demonstrate how anomaly score smoothing consistently improves performance. The practical utility of our analysis and recommendation is validated on a real-world dataset containing sales data for retail stores.
    Synthesize High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model. (arXiv:2304.02169v2 [cs.LG] UPDATED)
    Synthetic electronic health records (EHRs) that are both realistic and preserve privacy can serve as an alternative to real EHRs for machine learning (ML) modeling and statistical analysis. However, generating high-fidelity and granular electronic health record (EHR) data in its original, highly-dimensional form poses challenges for existing methods due to the complexities inherent in high-dimensional data. In this paper, we propose Hierarchical Autoregressive Language mOdel (HALO) for generating longitudinal high-dimensional EHR, which preserve the statistical properties of real EHR and can be used to train accurate ML models without privacy concerns. Our HALO method, designed as a hierarchical autoregressive model, generates a probability density function of medical codes, clinical visits, and patient records, allowing for the generation of realistic EHR data in its original, unaggregated form without the need for variable selection or aggregation. Additionally, our model also produces high-quality continuous variables in a longitudinal and probabilistic manner. We conducted extensive experiments and demonstrate that HALO can generate high-fidelity EHR data with high-dimensional disease code probabilities (d > 10,000), disease co-occurrence probabilities within visits (d > 1,000,000), and conditional probabilities across consecutive visits (d > 5,000,000) and achieve above 0.9 R2 correlation in comparison to real EHR data. This performance then enables downstream ML models trained on its synthetic data to achieve comparable accuracy to models trained on real data (0.938 AUROC with HALO data vs. 0.943 with real data). Finally, using a combination of real and synthetic data enhances the accuracy of ML models beyond that achieved by using only real EHR data.
    SieveNet: Selecting Point-Based Features for Mesh Networks. (arXiv:2308.12530v1 [cs.CV])
    Meshes are widely used in 3D computer vision and graphics, but their irregular topology poses challenges in applying them to existing neural network architectures. Recent advances in mesh neural networks turn to remeshing and push the boundary of pioneer methods that solely take the raw meshes as input. Although the remeshing offers a regular topology that significantly facilitates the design of mesh network architectures, features extracted from such remeshed proxies may struggle to retain the underlying geometry faithfully, limiting the subsequent neural network's capacity. To address this issue, we propose SieveNet, a novel paradigm that takes into account both the regular topology and the exact geometry. Specifically, this method utilizes structured mesh topology from remeshing and accurate geometric information from distortion-aware point sampling on the surface of the original mesh. Furthermore, our method eliminates the need for hand-crafted feature engineering and can leverage off-the-shelf network architectures such as the vision transformer. Comprehensive experimental results on classification and segmentation tasks well demonstrate the effectiveness and superiority of our method.
    Improving Generative Model-based Unfolding with Schr\"{o}dinger Bridges. (arXiv:2308.12351v1 [hep-ph])
    Machine learning-based unfolding has enabled unbinned and high-dimensional differential cross section measurements. Two main approaches have emerged in this research area: one based on discriminative models and one based on generative models. The main advantage of discriminative models is that they learn a small correction to a starting simulation while generative models scale better to regions of phase space with little data. We propose to use Schroedinger Bridges and diffusion models to create SBUnfold, an unfolding approach that combines the strengths of both discriminative and generative models. The key feature of SBUnfold is that its generative model maps one set of events into another without having to go through a known probability density as is the case for normalizing flows and standard diffusion models. We show that SBUnfold achieves excellent performance compared to state of the art methods on a synthetic Z+jets dataset.
    Evaluation of ChatGPT on Biomedical Tasks: A Zero-Shot Comparison with Fine-Tuned Generative Transformers. (arXiv:2306.04504v3 [cs.CL] UPDATED)
    ChatGPT is a large language model developed by OpenAI. Despite its impressive performance across various tasks, no prior work has investigated its capability in the biomedical domain yet. To this end, this paper aims to evaluate the performance of ChatGPT on various benchmark biomedical tasks, such as relation extraction, document classification, question answering, and summarization. To the best of our knowledge, this is the first work that conducts an extensive evaluation of ChatGPT in the biomedical domain. Interestingly, we find based on our evaluation that in biomedical datasets that have smaller training sets, zero-shot ChatGPT even outperforms the state-of-the-art fine-tuned generative transformer models, such as BioGPT and BioBART. This suggests that ChatGPT's pre-training on large text corpora makes it quite specialized even in the biomedical domain. Our findings demonstrate that ChatGPT has the potential to be a valuable tool for various tasks in the biomedical domain that lack large annotated data.
    Masked Autoencoders are Efficient Class Incremental Learners. (arXiv:2308.12510v1 [cs.CV])
    Class Incremental Learning (CIL) aims to sequentially learn new classes while avoiding catastrophic forgetting of previous knowledge. We propose to use Masked Autoencoders (MAEs) as efficient learners for CIL. MAEs were originally designed to learn useful representations through reconstructive unsupervised learning, and they can be easily integrated with a supervised loss for classification. Moreover, MAEs can reliably reconstruct original input images from randomly selected patches, which we use to store exemplars from past tasks more efficiently for CIL. We also propose a bilateral MAE framework to learn from image-level and embedding-level fusion, which produces better-quality reconstructed images and more stable representations. Our experiments confirm that our approach performs better than the state-of-the-art on CIFAR-100, ImageNet-Subset, and ImageNet-Full. The code is available at https://github.com/scok30/MAE-CIL .
    Zero-delay Consistent Signal Reconstruction from Streamed Multivariate Time Series. (arXiv:2308.12459v1 [eess.SP])
    Digitalizing real-world analog signals typically involves sampling in time and discretizing in amplitude. Subsequent signal reconstructions inevitably incur an error that depends on the amplitude resolution and the temporal density of the acquired samples. From an implementation viewpoint, consistent signal reconstruction methods have proven a profitable error-rate decay as the sampling rate increases. Despite that, these results are obtained under offline settings. Therefore, a research gap exists regarding methods for consistent signal reconstruction from data streams. This paper presents a method that consistently reconstructs streamed multivariate time series of quantization intervals under a zero-delay response requirement. On the other hand, previous work has shown that the temporal dependencies within univariate time series can be exploited to reduce the roughness of zero-delay signal reconstructions. This work shows that the spatiotemporal dependencies within multivariate time series can also be exploited to achieve improved results. Specifically, the spatiotemporal dependencies of the multivariate time series are learned, with the assistance of a recurrent neural network, to reduce the roughness of the signal reconstruction on average while ensuring consistency. Our experiments show that our proposed method achieves a favorable error-rate decay with the sampling rate compared to a similar but non-consistent reconstruction.
    A Huber Loss Minimization Approach to Byzantine Robust Federated Learning. (arXiv:2308.12581v1 [cs.LG])
    Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $\epsilon$, which stands for the ratio of attacked clients. Secondly, our approach does not need precise knowledge of $\epsilon$. Thirdly, it allows different clients to have unequal data sizes. We then broaden our analysis to include non-i.i.d data, such that clients have slightly different distributions.
    Label Budget Allocation in Multi-Task Learning. (arXiv:2308.12949v1 [cs.LG])
    The cost of labeling data often limits the performance of machine learning systems. In multi-task learning, related tasks provide information to each other and improve overall performance, but the label cost can vary among tasks. How should the label budget (i.e. the amount of money spent on labeling) be allocated among different tasks to achieve optimal multi-task performance? We are the first to propose and formally define the label budget allocation problem in multi-task learning and to empirically show that different budget allocation strategies make a big difference to its performance. We propose a Task-Adaptive Budget Allocation algorithm to robustly generate the optimal budget allocation adaptive to different multi-task learning settings. Specifically, we estimate and then maximize the extent of new information obtained from the allocated budget as a proxy for multi-task learning performance. Experiments on PASCAL VOC and Taskonomy demonstrate the efficacy of our approach over other widely used heuristic labeling strategies.
    Solving Forward and Inverse Problems of Contact Mechanics using Physics-Informed Neural Networks. (arXiv:2308.12716v1 [math.NA])
    This paper explores the ability of physics-informed neural networks (PINNs) to solve forward and inverse problems of contact mechanics for small deformation elasticity. We deploy PINNs in a mixed-variable formulation enhanced by output transformation to enforce Dirichlet and Neumann boundary conditions as hard constraints. Inequality constraints of contact problems, namely Karush-Kuhn-Tucker (KKT) type conditions, are enforced as soft constraints by incorporating them into the loss function during network training. To formulate the loss function contribution of KKT constraints, existing approaches applied to elastoplasticity problems are investigated and we explore a nonlinear complementarity problem (NCP) function, namely Fischer-Burmeister, which possesses advantageous characteristics in terms of optimization. Based on the Hertzian contact problem, we show that PINNs can serve as pure partial differential equation (PDE) solver, as data-enhanced forward model, as inverse solver for parameter identification, and as fast-to-evaluate surrogate model. Furthermore, we demonstrate the importance of choosing proper hyperparameters, e.g. loss weights, and a combination of Adam and L-BFGS-B optimizers aiming for better results in terms of accuracy and training time.
    Out of the Box Thinking: Improving Customer Lifetime Value Modelling via Expert Routing and Game Whale Detection. (arXiv:2308.12729v1 [cs.IR])
    Customer lifetime value (LTV) prediction is essential for mobile game publishers trying to optimize the advertising investment for each user acquisition based on the estimated worth. In mobile games, deploying microtransactions is a simple yet effective monetization strategy, which attracts a tiny group of game whales who splurge on in-game purchases. The presence of such game whales may impede the practicality of existing LTV prediction models, since game whales' purchase behaviours always exhibit varied distribution from general users. Consequently, identifying game whales can open up new opportunities to improve the accuracy of LTV prediction models. However, little attention has been paid to applying game whale detection in LTV prediction, and existing works are mainly specialized for the long-term LTV prediction with the assumption that the high-quality user features are available, which is not applicable in the UA stage. In this paper, we propose ExpLTV, a novel multi-task framework to perform LTV prediction and game whale detection in a unified way. In ExpLTV, we first innovatively design a deep neural network-based game whale detector that can not only infer the intrinsic order in accordance with monetary value, but also precisely identify high spenders (i.e., game whales) and low spenders. Then, by treating the game whale detector as a gating network to decide the different mixture patterns of LTV experts assembling, we can thoroughly leverage the shared information and scenario-specific information (i.e., game whales modelling and low spenders modelling). Finally, instead of separately designing a purchase rate estimator for two tasks, we design a shared estimator that can preserve the inner task relationships. The superiority of ExpLTV is further validated via extensive experiments on three industrial datasets.
    Fall Detection using Knowledge Distillation Based Long short-term memory for Offline Embedded and Low Power Devices. (arXiv:2308.12481v1 [eess.SP])
    This paper presents a cost-effective, low-power approach to unintentional fall detection using knowledge distillation-based LSTM (Long Short-Term Memory) models to significantly improve accuracy. With a primary focus on analyzing time-series data collected from various sensors, the solution offers real-time detection capabilities, ensuring prompt and reliable identification of falls. The authors investigate fall detection models that are based on different sensors, comparing their accuracy rates and performance. Furthermore, they employ the technique of knowledge distillation to enhance the models' precision, resulting in refined accurate configurations that consume lower power. As a result, this proposed solution presents a compelling avenue for the development of energy-efficient fall detection systems for future advancements in this critical domain.
    Efficient Adaptive Activation Rounding for Post-Training Quantization. (arXiv:2208.11945v3 [cs.LG] UPDATED)
    Post-training quantization attracts increasing attention due to its convenience in deploying quantized neural networks. Although rounding-to-nearest remains the prevailing method for DNN quantization, prior research has demonstrated its suboptimal nature when applied to weight quantization. They propose optimizing weight rounding schemes by leveraging output error rather than the traditional weight quantization error. Our study reveals that similar rounding challenges also extend to activation quantization. Despite the easy generalization, the challenges lie in the dynamic nature of activation. Adaptive rounding is expected for varying activations and the method is subjected to runtime overhead. To tackle this, we propose the AQuant quantization framework with a novel perspective to reduce output error by adjusting rounding schemes of activations. Instead of using the constant rounding border 0.5 of the rounding-to-nearest operation, we make the border become a function w.r.t. the activation value to change the activation rounding by the adaptive border. To deal with the runtime overhead, we use a coarse-grained version of the border function. Finally, we introduce our framework to optimize the border function. Extensive experiments show that AQuant achieves notable improvements compared to state-of-the-art works and pushes the accuracy of ResNet-18 up to 60.31% under the 2-bit weight and activation quantization.
    Tackling Face Verification Edge Cases: In-Depth Analysis and Human-Machine Fusion Approach. (arXiv:2304.08134v4 [cs.CV] UPDATED)
    Nowadays, face recognition systems surpass human performance on several datasets. However, there are still edge cases that the machine can't correctly classify. This paper investigates the effect of a combination of machine and human operators in the face verification task. First, we look closer at the edge cases for several state-of-the-art models to discover common datasets' challenging settings. Then, we conduct a study with 60 participants on these selected tasks with humans and provide an extensive analysis. Finally, we demonstrate that combining machine and human decisions can further improve the performance of state-of-the-art face verification systems on various benchmark datasets. Code and data are publicly available on GitHub.
    HypBO: Expert-Guided Chemist-in-the-Loop Bayesian Search for New Materials. (arXiv:2308.11787v2 [cs.LG] UPDATED)
    Robotics and automation offer massive accelerations for solving intractable, multivariate scientific problems such as materials discovery, but the available search spaces can be dauntingly large. Bayesian optimization (BO) has emerged as a popular sample-efficient optimization engine, thriving in tasks where no analytic form of the target function/property is known. Here we exploit expert human knowledge in the form of hypotheses to direct Bayesian searches more quickly to promising regions of chemical space. Previous methods have used underlying distributions derived from existing experimental measurements, which is unfeasible for new, unexplored scientific tasks. Also, such distributions cannot capture intricate hypotheses. Our proposed method, which we call HypBO, uses expert human hypotheses to generate an improved seed of samples. Unpromising seeds are automatically discounted, while promising seeds are used to augment the surrogate model data, thus achieving better-informed sampling. This process continues in a global versus local search fashion, organized in a bilevel optimization framework. We validate the performance of our method on a range of synthetic functions and demonstrate its practical utility on a real chemical design task where the use of expert hypotheses accelerates the search performance significantly.
    CALM : A Multi-task Benchmark for Comprehensive Assessment of Language Model Bias. (arXiv:2308.12539v1 [cs.CL])
    As language models (LMs) become increasingly powerful, it is important to quantify and compare them for sociodemographic bias with potential for harm. Prior bias measurement datasets are sensitive to perturbations in their manually designed templates, therefore unreliable. To achieve reliability, we introduce the Comprehensive Assessment of Language Model bias (CALM), a benchmark dataset to quantify bias in LMs across three tasks. We integrate 16 existing datasets across different domains, such as Wikipedia and news articles, to filter 224 templates from which we construct a dataset of 78,400 examples. We compare the diversity of CALM with prior datasets on metrics such as average semantic similarity, and variation in template length, and test the sensitivity to small perturbations. We show that our dataset is more diverse and reliable than previous datasets, thus better capture the breadth of linguistic variation required to reliably evaluate model bias. We evaluate 20 large language models including six prominent families of LMs such as Llama-2. In two LM series, OPT and Bloom, we found that larger parameter models are more biased than lower parameter models. We found the T0 series of models to be the least biased. Furthermore, we noticed a tradeoff between gender and racial bias with increasing model size in some model series. The code is available at https://github.com/vipulgupta1011/CALM.
    BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection. (arXiv:2308.12439v1 [cs.CR])
    We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 16 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).
    Federated Learning in Big Model Era: Domain-Specific Multimodal Large Models. (arXiv:2308.11217v3 [cs.LG] UPDATED)
    Multimodal data, which can comprehensively perceive and recognize the physical world, has become an essential path towards general artificial intelligence. However, multimodal large models trained on public datasets often underperform in specific industrial domains. This paper proposes a multimodal federated learning framework that enables multiple enterprises to utilize private domain data to collaboratively train large models for vertical domains, achieving intelligent services across scenarios. The authors discuss in-depth the strategic transformation of federated learning in terms of intelligence foundation and objectives in the era of big model, as well as the new challenges faced in heterogeneous data, model aggregation, performance and cost trade-off, data privacy, and incentive mechanism. The paper elaborates a case study of leading enterprises contributing multimodal data and expert knowledge to city safety operation management , including distributed deployment and efficient coordination of the federated learning platform, technical innovations on data quality improvement based on large model capabilities and efficient joint fine-tuning approaches. Preliminary experiments show that enterprises can enhance and accumulate intelligent capabilities through multimodal model federated learning, thereby jointly creating an smart city model that provides high-quality intelligent services covering energy infrastructure safety, residential community security, and urban operation management. The established federated learning cooperation ecosystem is expected to further aggregate industry, academia, and research resources, realize large models in multiple vertical domains, and promote the large-scale industrial application of artificial intelligence and cutting-edge research on multimodal federated learning.
    Conditional Kernel Imitation Learning for Continuous State Environments. (arXiv:2308.12573v1 [cs.LG])
    Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) methodology. Unlike most of RL, it does not assume availability of reward-feedback. Reward inference and shaping are known to be difficult and error-prone methods particularly when the demonstration data comes from human experts. Classical methods such as behavioral cloning and inverse reinforcement learning are highly sensitive to estimation errors, a problem that is particularly acute in continuous state space problems. Meanwhile, state-of-the-art IL algorithms convert behavioral policy learning problems into distribution-matching problems which often require additional online interaction data to be effective. In this paper, we consider the problem of imitation learning in continuous state space environments based solely on observed behavior, without access to transition dynamics information, reward structure, or, most importantly, any additional interactions with the environment. Our approach is based on the Markov balance equation and introduces a novel conditional kernel density estimation-based imitation learning framework. It involves estimating the environment's transition dynamics using conditional kernel density estimators and seeks to satisfy the probabilistic balance equations for the environment. We establish that our estimators satisfy basic asymptotic consistency requirements. Through a series of numerical experiments on continuous state benchmark environments, we show consistently superior empirical performance over many state-of-the-art IL algorithms.
    Evolution of ESG-focused DLT Research: An NLP Analysis of the Literature. (arXiv:2308.12420v1 [cs.IR])
    Distributed Ledger Technologies (DLTs) have rapidly evolved, necessitating comprehensive insights into their diverse components. However, a systematic literature review that emphasizes the Environmental, Sustainability, and Governance (ESG) components of DLT remains lacking. To bridge this gap, we selected 107 seed papers to build a citation network of 63,083 references and refined it to a corpus of 24,539 publications for analysis. Then, we labeled the named entities in 46 papers according to twelve top-level categories derived from an established technology taxonomy and enhanced the taxonomy by pinpointing DLT's ESG elements. Leveraging transformer-based language models, we fine-tuned a pre-trained language model for a Named Entity Recognition (NER) task using our labeled dataset. We used our fine-tuned language model to distill the corpus to 505 key papers, facilitating a literature review via named entities and temporal graph analysis on DLT evolution in the context of ESG. Our contributions are a methodology to conduct a machine learning-driven systematic literature review in the DLT field, placing a special emphasis on ESG aspects. Furthermore, we present a first-of-its-kind NER dataset, composed of 54,808 named entities, designed for DLT and ESG-related explorations.
    Learning Only On Boundaries: a Physics-Informed Neural operator for Solving Parametric Partial Differential Equations in Complex Geometries. (arXiv:2308.12939v1 [cs.LG])
    Recently deep learning surrogates and neural operators have shown promise in solving partial differential equations (PDEs). However, they often require a large amount of training data and are limited to bounded domains. In this work, we present a novel physics-informed neural operator method to solve parametrized boundary value problems without labeled data. By reformulating the PDEs into boundary integral equations (BIEs), we can train the operator network solely on the boundary of the domain. This approach reduces the number of required sample points from $O(N^d)$ to $O(N^{d-1})$, where $d$ is the domain's dimension, leading to a significant acceleration of the training process. Additionally, our method can handle unbounded problems, which are unattainable for existing physics-informed neural networks (PINNs) and neural operators. Our numerical experiments show the effectiveness of parametrized complex geometries and unbounded problems.
    Anderson Acceleration For Bioinformatics-Based Machine Learning. (arXiv:2302.00347v2 [cs.LG] UPDATED)
    Anderson acceleration (AA) is a well-known method for accelerating the convergence of iterative algorithms, with applications in various fields including deep learning and optimization. Despite its popularity in these areas, the effectiveness of AA in classical machine learning classifiers has not been thoroughly studied. Tabular data, in particular, presents a unique challenge for deep learning models, and classical machine learning models are known to perform better in these scenarios. However, the convergence analysis of these models has received limited attention. To address this gap in research, we implement a support vector machine (SVM) classifier variant that incorporates AA to speed up convergence. We evaluate the performance of our SVM with and without Anderson acceleration on several datasets from the biology domain and demonstrate that the use of AA significantly improves convergence and reduces the training loss as the number of iterations increases. Our findings provide a promising perspective on the potential of Anderson acceleration in the training of simple machine learning classifiers and underscore the importance of further research in this area. By showing the effectiveness of AA in this setting, we aim to inspire more studies that explore the applications of AA in classical machine learning.
    Quantized Radio Map Estimation Using Tensor and Deep Generative Models. (arXiv:2303.01770v2 [eess.SP] UPDATED)
    Spectrum cartography (SC), also known as radio map estimation (RME), aims at crafting multi-domain (e.g., frequency and space) radio power propagation maps from limited sensor measurements. While early methods often lacked theoretical support, recent works have demonstrated that radio maps can be provably recovered using low-dimensional models -- such as the block-term tensor decomposition (BTD) model and certain deep generative models (DGMs) -- of the high-dimensional multi-domain radio signals. However, these existing provable SC approaches assume that sensors send real-valued (full-resolution) measurements to the fusion center, which is unrealistic. This work puts forth a quantized SC framework that generalizes the BTD and DGM-based SC to scenarios where heavily quantized sensor measurements are used. A maximum likelihood estimation (MLE)-based SC framework under a Gaussian quantizer is proposed. Recoverability of the radio map using the MLE criterion are characterized under realistic conditions, e.g., imperfect radio map modeling and noisy measurements. Simulations and real-data experiments are used to showcase the effectiveness of the proposed approach.
    Riemannian Hamiltonian methods for min-max optimization on manifolds. (arXiv:2204.11418v3 [math.OC] UPDATED)
    In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--{\L}ojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. For geodesic-bilinear optimization in particular, solving the proxy problem leads to the correct search direction towards global optimality, which becomes challenging with the min-max formulation. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analyses. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks.
    IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency. (arXiv:2308.12871v1 [cs.DC])
    Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective inference is a crucial challenge in ML production systems, given their tight end-to-end latency requirements. To simplify the exploration of the vast and intricate trade-off space of accuracy and cost in inference pipelines, providers frequently opt to consider one of them. However, the challenge lies in reconciling accuracy and cost trade-offs. To address this challenge and propose a solution to efficiently manage model variants in inference pipelines, we present IPA, an online deep-learning Inference Pipeline Adaptation system that efficiently leverages model variants for each deep learning task. Model variants are different versions of pre-trained models for the same deep learning task with variations in resource requirements, latency, and accuracy. IPA dynamically configures batch size, replication, and model variants to optimize accuracy, minimize costs, and meet user-defined latency SLAs using Integer Programming. It supports multi-objective settings for achieving different trade-offs between accuracy and cost objectives while remaining adaptable to varying workloads and dynamic traffic patterns. Extensive experiments on a Kubernetes implementation with five real-world inference pipelines demonstrate that IPA improves normalized accuracy by up to 35% with a minimal cost increase of less than 5%.
    Renormalizing Diffusion Models. (arXiv:2308.12355v1 [hep-th])
    We explain how to use diffusion models to learn inverse renormalization group flows of statistical and quantum field theories. Diffusion models are a class of machine learning models which have been used to generate samples from complex distributions, such as the distribution of natural images, by learning the inverse process to a diffusion process which adds noise to the data until the distribution of the data is pure noise. Nonperturbative renormalization group schemes can naturally be written as diffusion processes in the space of fields. We combine these observations in a concrete framework for building ML-based models for studying field theories, in which the models learn the inverse process to an explicitly-specified renormalization group scheme. We detail how these models define a class of adaptive bridge (or parallel tempering) samplers for lattice field theory. Because renormalization group schemes have a physical meaning, we provide explicit prescriptions for how to compare results derived from models associated to several different renormalization group schemes of interest. We also explain how to use diffusion models in a variational method to find ground states of quantum systems. We apply some of our methods to numerically find RG flows of interacting statistical field theories. From the perspective of machine learning, our work provides an interpretation of multiscale diffusion models, and gives physically-inspired suggestions for diffusion models which should have novel properties.
    Predicting Drug Solubility Using Different Machine Learning Methods -- Linear Regression Model with Extracted Chemical Features vs Graph Convolutional Neural Network. (arXiv:2308.12325v1 [q-bio.QM])
    Predicting the solubility of given molecules is an important task in the pharmaceutical industry, and consequently this is a well-studied topic. In this research, we revisited this problem with the advantage of modern computing resources. We applied two machine learning models, a linear regression model and a graph convolutional neural network model, on multiple experimental datasets. Both methods can make reasonable predictions while the GCNN model had the best performance. However, the current GCNN model is a black box, while feature importance analysis from the linear regression model offers more insights into the underlying chemical influences. Using the linear regression model, we show how each functional group affects the overall solubility. Ultimately, knowing how chemical structure influences chemical properties is crucial when designing new drugs. Future work should aim to combine the high performance of GCNNs with the interpretability of linear regression, unlocking new advances in next generation high throughput screening.
    FOSA: Full Information Maximum Likelihood (FIML) Optimized Self-Attention Imputation for Missing Data. (arXiv:2308.12388v1 [cs.LG])
    In data imputation, effectively addressing missing values is pivotal, especially in intricate datasets. This paper delves into the FIML Optimized Self-attention (FOSA) framework, an innovative approach that amalgamates the strengths of Full Information Maximum Likelihood (FIML) estimation with the capabilities of self-attention neural networks. Our methodology commences with an initial estimation of missing values via FIML, subsequently refining these estimates by leveraging the self-attention mechanism. Our comprehensive experiments on both simulated and real-world datasets underscore FOSA's pronounced advantages over traditional FIML techniques, encapsulating facets of accuracy, computational efficiency, and adaptability to diverse data structures. Intriguingly, even in scenarios where the Structural Equation Model (SEM) might be mis-specified, leading to suboptimal FIML estimates, the robust architecture of FOSA's self-attention component adeptly rectifies and optimizes the imputation outcomes. Our empirical tests reveal that FOSA consistently delivers commendable predictions, even in the face of up to 40% random missingness, highlighting its robustness and potential for wide-scale applications in data imputation.
    Towards Top-Down Automated Development in Limited Scopes: A Neuro-Symbolic Framework from Expressibles to Executables. (arXiv:2209.01566v4 [cs.SE] UPDATED)
    Deep code generation is a topic of deep learning for software engineering (DL4SE), which adopts neural models to generate code for the intended functions. Since end-to-end neural methods lack domain knowledge and software hierarchy awareness, they tend to perform poorly w.r.t project-level tasks. To systematically explore the potential improvements of code generation, we let it participate in the whole top-down development from \emph{expressibles} to \emph{executables}, which is possible in limited scopes. In the process, it benefits from massive samples, features, and knowledge. As the foundation, we suggest building a taxonomy on code data, namely code taxonomy, leveraging the categorization of code information. Moreover, we introduce a three-layer semantic pyramid (SP) to associate text data and code data. It identifies the information of different abstraction levels, and thus introduces the domain knowledge on development and reveals the hierarchy of software. Furthermore, we propose a semantic pyramid framework (SPF) as the approach, focusing on software of high modularity and low complexity. SPF divides the code generation process into stages and reserves spots for potential interactions. In addition, we conceived preliminary applications in software development to confirm the neuro-symbolic framework.
    Towards Realistic Unsupervised Fine-tuning with CLIP. (arXiv:2308.12919v1 [cs.CV])
    The emergence of vision-language models (VLMs), such as CLIP, has spurred a significant research effort towards their application for downstream supervised learning tasks. Although some previous studies have explored the unsupervised fine-tuning of CLIP, they often rely on prior knowledge in the form of class names associated with ground truth labels. In this paper, we delve into a realistic unsupervised fine-tuning scenario by assuming that the unlabeled data might contain out-of-distribution samples from unknown classes. Furthermore, we emphasize the importance of simultaneously enhancing out-of-distribution detection capabilities alongside the recognition of instances associated with predefined class labels. To tackle this problem, we present a simple, efficient, and effective fine-tuning approach called Universal Entropy Optimization (UEO). UEO leverages sample-level confidence to approximately minimize the conditional entropy of confident instances and maximize the marginal entropy of less confident instances. Apart from optimizing the textual prompts, UEO also incorporates optimization of channel-wise affine transformations within the visual branch of CLIP. Through extensive experiments conducted across 15 domains and 4 different types of prior knowledge, we demonstrate that UEO surpasses baseline methods in terms of both generalization and out-of-distribution detection.
    Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward. (arXiv:2308.12772v1 [cs.RO])
    Robot control using reinforcement learning has become popular, but its learning process generally terminates halfway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termination. That is, by forcibly assuming zero value after termination, unintentionally implicit underestimation or overestimation occurs, depending on the reward design in the normal states. When the episode is terminated due to task failure, the failure may be highly valued with the unintentional overestimation, and the wrong policy may be acquired. Although this problem can be avoided by paying attention to the reward design, it is essential in practical use of TD learning to review the exception handling at termination. This paper therefore proposes a method to intentionally underestimate the value after termination to avoid learning failures due to the unintentional overestimation. In addition, the degree of underestimation is adjusted according to the degree of stationarity at termination, thereby preventing excessive exploration due to the intentional underestimation. Simulations and real robot experiments showed that the proposed method can stably obtain the optimal policies for various tasks and reward designs. https://youtu.be/AxXr8uFOe7M
    Augmenting Reinforcement Learning with Transformer-based Scene Representation Learning for Decision-making of Autonomous Driving. (arXiv:2208.12263v2 [cs.LG] UPDATED)
    Decision-making for urban autonomous driving is challenging due to the stochastic nature of interactive traffic participants and the complexity of road structures. Although reinforcement learning (RL)-based decision-making scheme is promising to handle urban driving scenarios, it suffers from low sample efficiency and poor adaptability. In this paper, we propose Scene-Rep Transformer to improve the RL decision-making capabilities with better scene representation encoding and sequential predictive latent distillation. Specifically, a multi-stage Transformer (MST) encoder is constructed to model not only the interaction awareness between the ego vehicle and its neighbors but also intention awareness between the agents and their candidate routes. A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation, in order to reduce the exploration space and speed up training. The final decision-making module based on soft actor-critic (SAC) takes as input the refined latent scene representation from the Scene-Rep Transformer and outputs driving actions. The framework is validated in five challenging simulated urban scenarios with dense traffic, and its performance is manifested quantitatively by the substantial improvements in data efficiency and performance in terms of success rate, safety, and efficiency. The qualitative results reveal that our framework is able to extract the intentions of neighbor agents to help make decisions and deliver more diversified driving behaviors.
    An Efficient Distributed Multi-Agent Reinforcement Learning for EV Charging Network Control. (arXiv:2308.12921v1 [cs.MA])
    The increasing trend in adopting electric vehicles (EVs) will significantly impact the residential electricity demand, which results in an increased risk of transformer overload in the distribution grid. To mitigate such risks, there are urgent needs to develop effective EV charging controllers. Currently, the majority of the EV charge controllers are based on a centralized approach for managing individual EVs or a group of EVs. In this paper, we introduce a decentralized Multi-agent Reinforcement Learning (MARL) charging framework that prioritizes the preservation of privacy for EV owners. We employ the Centralized Training Decentralized Execution-Deep Deterministic Policy Gradient (CTDE-DDPG) scheme, which provides valuable information to users during training while maintaining privacy during execution. Our results demonstrate that the CTDE framework improves the performance of the charging network by reducing the network costs. Moreover, we show that the Peak-to-Average Ratio (PAR) of the total demand is reduced, which, in turn, reduces the risk of transformer overload during the peak hours.
    PruMUX: Augmenting Data Multiplexing with Model Compression. (arXiv:2305.14706v2 [cs.LG] UPDATED)
    As language models increase in size by the day, methods for efficient inference are critical to leveraging their capabilities for various applications. Prior work has investigated techniques like model pruning, knowledge distillation, and data multiplexing to increase model throughput without sacrificing accuracy. In this paper, we combine two such methods -- structured pruning and data multiplexing -- to compound the speedup gains obtained by either method. Our approach, PruMUX, obtains up to 7.5-29.5X throughput improvement over BERT-base model with accuracy threshold from 80% to 74%. We further study various combinations of parameters (such as sparsity and multiplexing factor) in the two techniques to provide a comprehensive analysis of the tradeoff between accuracy and throughput in the resulting models. We then propose Auto-PruMUX, a meta-level model that can predict the high-performance parameters for pruning and multiplexing given a desired accuracy loss budget, providing a practical method to leverage the combination effectively.
    UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023. (arXiv:2308.12526v1 [eess.AS])
    This report describes the UNISOUND submission for Track1 and Track2 of VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC 2023). We submit the same system on Track 1 and Track 2, which is trained with only VoxCeleb2-dev. Large-scale ResNet and RepVGG architectures are developed for the challenge. We propose a consistency-aware score calibration method, which leverages the stability of audio voiceprints in similarity score by a Consistency Measure Factor (CMF). CMF brings a huge performance boost in this challenge. Our final system is a fusion of six models and achieves the first place in Track 1 and second place in Track 2 of VoxSRC 2023. The minDCF of our submission is 0.0855 and the EER is 1.5880%.
    A temporally and spatially local spike-based backpropagation algorithm to enable training in hardware. (arXiv:2207.09755v2 [cs.NE] UPDATED)
    Spiking Neural Networks (SNNs) have emerged as a hardware efficient architecture for classification tasks. The challenge of spike-based encoding has been the lack of a universal training mechanism performed entirely using spikes. There have been several attempts to adopt the powerful backpropagation (BP) technique used in non-spiking artificial neural networks (ANN): (1) SNNs can be trained by externally computed numerical gradients. (2) A major advancement towards native spike-based learning has been the use of approximate Backpropagation using spike-time dependent plasticity (STDP) with phased forward/backward passes. However, the transfer of information between such phases for gradient and weight update calculation necessitates external memory and computational access. This is a challenge for standard neuromorphic hardware implementations. In this paper, we propose a stochastic SNN based Back-Prop (SSNN-BP) algorithm that utilizes a composite neuron to simultaneously compute the forward pass activations and backward pass gradients explicitly with spikes. Although signed gradient values are a challenge for spike-based representation, we tackle this by splitting the gradient signal into positive and negative streams. We show that our method approaches BP ANN baseline with sufficiently long spike-trains. Finally, we show that the well-performing softmax cross-entropy loss function can be implemented through inhibitory lateral connections enforcing a Winner Take All (WTA) rule. Our SNN with a 2-layer network shows excellent generalization through comparable performance to ANNs with equivalent architecture and regularization parameters on static image datasets like MNIST, Fashion-MNIST, Extended MNIST, and temporally encoded image datasets like Neuromorphic MNIST datasets. Thus, SSNN-BP enables BP compatible with purely spike-based neuromorphic hardware.
    Algorithmic progress in computer vision. (arXiv:2212.05153v4 [cs.CV] UPDATED)
    We investigate algorithmic progress in image classification on ImageNet, perhaps the most well-known test bed for computer vision. We estimate a model, informed by work on neural scaling laws, and infer a decomposition of progress into the scaling of compute, data, and algorithms. Using Shapley values to attribute performance improvements, we find that algorithmic improvements have been roughly as important as the scaling of compute for progress computer vision. Our estimates indicate that algorithmic innovations mostly take the form of compute-augmenting algorithmic advances (which enable researchers to get better performance from less compute), not data-augmenting algorithmic advances. We find that compute-augmenting algorithmic advances are made at a pace more than twice as fast as the rate usually associated with Moore's law. In particular, we estimate that compute-augmenting innovations halve compute requirements every nine months (95\% confidence interval: 4 to 25 months).
    Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion. (arXiv:2308.12517v1 [cs.RO])
    Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of designing numerous reward terms and determining suitable reward coefficients. In this work, we propose a novel reinforcement learning framework for training neural network controllers for complex robotic systems consisting of both rewards and constraints. To let the engineers appropriately reflect their intent to constraints and handle them with minimal computation overhead, two constraint types and an efficient policy optimization algorithm are suggested. The learning framework is applied to train locomotion controllers for several legged robots with different morphology and physical attributes to traverse challenging terrains. Extensive simulation and real-world experiments demonstrate that performant controllers can be trained with significantly less reward engineering, by tuning only a single reward coefficient. Furthermore, a more straightforward and intuitive engineering process can be utilized, thanks to the interpretability and generalizability of constraints. The summary video is available at https://youtu.be/KAlm3yskhvM.
    On the Generalization of PINNs outside the training domain and the Hyperparameters influencing it. (arXiv:2302.07557v2 [cs.LG] UPDATED)
    Physics-Informed Neural Networks (PINNs) are Neural Network architectures trained to emulate solutions of differential equations without the necessity of solution data. They are currently ubiquitous in the scientific literature due to their flexible and promising settings. However, very little of the available research provides practical studies that aim for a better quantitative understanding of such architecture and its functioning. In this paper, we perform an empirical analysis of the behavior of PINN predictions outside their training domain. The primary goal is to investigate the scenarios in which a PINN can provide consistent predictions outside the training area. Thereinafter, we assess whether the algorithmic setup of PINNs can influence their potential for generalization and showcase the respective effect on the prediction. The results obtained in this study returns insightful and at times counterintuitive perspectives which can be highly relevant for architectures which combines PINNs with domain decomposition and/or adaptive training strategies.
    Single-shot Bayesian approximation for neural networks. (arXiv:2308.12785v1 [cs.LG])
    Deep neural networks (NNs) are known for their high-prediction performances. However, NNs are prone to yield unreliable predictions when encountering completely new situations without indicating their uncertainty. Bayesian variants of NNs (BNNs), such as Monte Carlo (MC) dropout BNNs, do provide uncertainty measures and simultaneously increase the prediction performance. The only disadvantage of BNNs is their higher computation time during test time because they rely on a sampling approach. Here we present a single-shot MC dropout approximation that preserves the advantages of BNNs while being as fast as NNs. Our approach is based on moment propagation (MP) and allows to analytically approximate the expected value and the variance of the MC dropout signal for commonly used layers in NNs, i.e. convolution, max pooling, dense, softmax, and dropout layers. The MP approach can convert an NN into a BNN without re-training given the NN has been trained with standard dropout. We evaluate our approach on different benchmark datasets and a simulated toy example in a classification and regression setting. We demonstrate that our single-shot MC dropout approximation resembles the point estimate and the uncertainty estimate of the predictive distribution that is achieved with an MC approach, while being fast enough for real-time deployments of BNNs. We show that using part of the saved time to combine our MP approach with deep ensemble techniques does further improve the uncertainty measures.
    Trustworthy Representation Learning Across Domains. (arXiv:2308.12315v1 [cs.LG])
    As AI systems have obtained significant performance to be deployed widely in our daily live and human society, people both enjoy the benefits brought by these technologies and suffer many social issues induced by these systems. To make AI systems good enough and trustworthy, plenty of researches have been done to build guidelines for trustworthy AI systems. Machine learning is one of the most important parts for AI systems and representation learning is the fundamental technology in machine learning. How to make the representation learning trustworthy in real-world application, e.g., cross domain scenarios, is very valuable and necessary for both machine learning and AI system fields. Inspired by the concepts in trustworthy AI, we proposed the first trustworthy representation learning across domains framework which includes four concepts, i.e, robustness, privacy, fairness, and explainability, to give a comprehensive literature review on this research direction. Specifically, we first introduce the details of the proposed trustworthy framework for representation learning across domains. Second, we provide basic notions and comprehensively summarize existing methods for the trustworthy framework from four concepts. Finally, we conclude this survey with insights and discussions on future research directions.
    Optimizing Neural Network Scale for ECG Classification. (arXiv:2308.12492v1 [cs.LG])
    We study scaling convolutional neural networks (CNNs), specifically targeting Residual neural networks (ResNet), for analyzing electrocardiograms (ECGs). Although ECG signals are time-series data, CNN-based models have been shown to outperform other neural networks with different architectures in ECG analysis. However, most previous studies in ECG analysis have overlooked the importance of network scaling optimization, which significantly improves performance. We explored and demonstrated an efficient approach to scale ResNet by examining the effects of crucial parameters, including layer depth, the number of channels, and the convolution kernel size. Through extensive experiments, we found that a shallower network, a larger number of channels, and smaller kernel sizes result in better performance for ECG classifications. The optimal network scale might differ depending on the target task, but our findings provide insight into obtaining more efficient and accurate models with fewer computing resources or less time. In practice, we demonstrate that a narrower search space based on our findings leads to higher performance.
    Test-Time Adaptation for Visual Document Understanding. (arXiv:2206.07240v2 [cs.CV] UPDATED)
    For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document data. DocTTA leverages cross-modality self-supervised learning via masked visual language modeling, as well as pseudo labeling to adapt models learned on a \textit{source} domain to an unlabeled \textit{target} domain at test time. We introduce new benchmarks using existing public datasets for various VDU tasks, including entity recognition, key-value extraction, and document visual question answering. DocTTA shows significant improvements on these compared to the source model performance, up to 1.89\% in (F1 score), 3.43\% (F1 score), and 17.68\% (ANLS score), respectively. Our benchmark datasets are available at \url{https://saynaebrahimi.github.io/DocTTA.html}.
    Machine Learning Small Molecule Properties in Drug Discovery. (arXiv:2308.12354v1 [q-bio.BM])
    Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.
    Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints. (arXiv:2308.12680v1 [cs.LG])
    We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning based optimization and the policy co-training technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network to make a decision with a trade-off between exploration and exploitation. Thanks to the elaborate design of slave models, the co-training mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at: \url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.
    Farm-wide virtual load monitoring for offshore wind structures via Bayesian neural networks. (arXiv:2211.00642v2 [cs.LG] UPDATED)
    Offshore wind structures are subject to deterioration mechanisms throughout their operational lifetime. Even if the deterioration evolution of structural elements can be estimated through physics-based deterioration models, the uncertainties involved in the process hurdle the selection of lifecycle management decisions. In this scenario, the collection of relevant information through an efficient monitoring system enables the reduction of uncertainties, ultimately driving more optimal lifecycle decisions. However, a full monitoring instrumentation implemented on all wind turbines in a farm might become unfeasible due to practical and economical constraints. Besides, certain load monitoring systems often become defective after a few years of marine environment exposure. Addressing the aforementioned concerns, a farm-wide virtual load monitoring scheme directed by a fleet-leader wind turbine offers an attractive solution. Fetched with data retrieved from a fully-instrumented wind turbine, a model can be trained and then deployed, thus yielding load predictions of non-fully monitored wind turbines, from which only standard data remains available. In this paper, we propose a virtual load monitoring framework formulated via Bayesian neural networks (BNNs) and we provide relevant implementation details needed for the construction, training, and deployment of BNN data-based virtual monitoring models. As opposed to their deterministic counterparts, BNNs intrinsically announce the uncertainties associated with generated load predictions and allow to detect inaccurate load estimations generated for non-fully monitored wind turbines. The proposed virtual load monitoring is thoroughly tested through an experimental campaign in an operational offshore wind farm and the results demonstrate the effectiveness of BNN models for fleet-leader-based farm-wide virtual monitoring.
    Efficient-Adam: Communication-Efficient Distributed Adam. (arXiv:2205.14473v2 [cs.LG] UPDATED)
    Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex optimization, such as training deep learning models. However, their communication complexity on finding $\varepsilon$-stationary points has rarely been analyzed in the nonconvex setting. In this work, we present a novel communication-efficient distributed Adam in the parameter-server model for stochastic nonconvex optimization, dubbed {\em Efficient-Adam}. Specifically, we incorporate a two-way quantization scheme into Efficient-Adam to reduce the communication cost between the workers and server. Simultaneously, we adopt a two-way error feedback strategy to reduce the biases caused by the two-way quantization on both the server and workers, respectively. In addition, we establish the iteration complexity for the proposed Efficient-Adam with a class of quantization operators, and further characterize its communication complexity between the server and workers when an $\varepsilon$-stationary point is achieved. Finally, we apply Efficient-Adam to solve a toy stochastic convex optimization problem and train deep learning models on real-world vision and language tasks. Extensive experiments together with a theoretical guarantee justify the merits of Efficient Adam.
    Graph Neural Stochastic Differential Equations. (arXiv:2308.12316v1 [cs.LG])
    We present a novel model Graph Neural Stochastic Differential Equations (Graph Neural SDEs). This technique enhances the Graph Neural Ordinary Differential Equations (Graph Neural ODEs) by embedding randomness into data representation using Brownian motion. This inclusion allows for the assessment of prediction uncertainty, a crucial aspect frequently missed in current models. In our framework, we spotlight the \textit{Latent Graph Neural SDE} variant, demonstrating its effectiveness. Through empirical studies, we find that Latent Graph Neural SDEs surpass conventional models like Graph Convolutional Networks and Graph Neural ODEs, especially in confidence prediction, making them superior in handling out-of-distribution detection across both static and spatio-temporal contexts.
    SafeAR: Towards Safer Algorithmic Recourse by Risk-Aware Policies. (arXiv:2308.12367v1 [cs.LG])
    With the growing use of machine learning (ML) models in critical domains such as finance and healthcare, the need to offer recourse for those adversely affected by the decisions of ML models has become more important; individuals ought to be provided with recommendations on actions to take for improving their situation and thus receive a favorable decision. Prior work on sequential algorithmic recourse -- which recommends a series of changes -- focuses on action feasibility and uses the proximity of feature changes to determine action costs. However, the uncertainties of feature changes and the risk of higher than average costs in recourse have not been considered. It is undesirable if a recourse could (with some probability) result in a worse situation from which recovery requires an extremely high cost. It is essential to incorporate risks when computing and evaluating recourse. We call the recourse computed with such risk considerations as Safer Algorithmic Recourse (SafeAR). The objective is to empower people to choose a recourse based on their risk tolerance. In this work, we discuss and show how existing recourse desiderata can fail to capture the risk of higher costs. We present a method to compute recourse policies that consider variability in cost and connect algorithmic recourse literature with risk-sensitive reinforcement learning. We also adopt measures ``Value at Risk'' and ``Conditional Value at Risk'' from the financial literature to summarize risk concisely. We apply our method to two real-world datasets and compare policies with different levels of risk-aversion using risk measures and recourse desiderata (sparsity and proximity).
  • Open

    Unifying Gradients to Improve Real-world Robustness for Deep Networks. (arXiv:2208.06228v2 [stat.ML] UPDATED)
    The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i.e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs. Defending against SQAs requires a slight but artful variation of outputs due to the service purpose for users, who share the same output information with SQAs. In this paper, we propose a real-world defense by Unifying Gradients (UniG) of different data so that SQAs could only probe a much weaker attack direction that is similar for different samples. Since such universal attack perturbations have been validated as less aggressive than the input-specific perturbations, UniG protects real-world DNNs by indicating attackers a twisted and less informative attack direction. We implement UniG efficiently by a Hadamard product module which is plug-and-play. According to extensive experiments on 5 SQAs, 2 adaptive attacks and 7 defense baselines, UniG significantly improves real-world robustness without hurting clean accuracy on CIFAR10 and ImageNet. For instance, UniG maintains a model of 77.80% accuracy under 2500-query Square attack while the state-of-the-art adversarially-trained model only has 67.34% on CIFAR10. Simultaneously, UniG outperforms all compared baselines in terms of clean accuracy and achieves the smallest modification of the model output. The code is released at https://github.com/snowien/UniG-pytorch.
    Prediction without Preclusion: Recourse Verification with Reachable Sets. (arXiv:2308.12820v1 [cs.LG])
    Machine learning models are often used to decide who will receive a loan, a job interview, or a public benefit. Standard techniques to build these models use features about people but overlook their actionability. In turn, models can assign predictions that are fixed, meaning that consumers who are denied loans, interviews, or benefits may be permanently locked out from access to credit, employment, or assistance. In this work, we introduce a formal testing procedure to flag models that assign fixed predictions that we call recourse verification. We develop machinery to reliably determine if a given model can provide recourse to its decision subjects from a set of user-specified actionability constraints. We demonstrate how our tools can ensure recourse and adversarial robustness in real-world datasets and use them to study the infeasibility of recourse in real-world lending datasets. Our results highlight how models can inadvertently assign fixed predictions that permanently bar access, and we provide tools to design algorithms that account for actionability when developing models.
    Multi-fidelity Fourier Neural Operator for Fast Modeling of Large-Scale Geological Carbon Storage. (arXiv:2308.09113v2 [stat.ML] UPDATED)
    Deep learning-based surrogate models have been widely applied in geological carbon storage (GCS) problems to accelerate the prediction of reservoir pressure and CO2 plume migration. Large amounts of data from physics-based numerical simulators are required to train a model to accurately predict the complex physical behaviors associated with this process. In practice, the available training data are always limited in large-scale 3D problems due to the high computational cost. Therefore, we propose to use a multi-fidelity Fourier Neural Operator to solve large-scale GCS problems with more affordable multi-fidelity training datasets. The Fourier Neural Operator has a desirable grid-invariant property, which simplifies the transfer learning procedure between datasets with different discretization. We first test the model efficacy on a GCS reservoir model being discretized into 110k grid cells. The multi-fidelity model can predict with accuracy comparable to a high-fidelity model trained with the same amount of high-fidelity data with 81% less data generation costs. We further test the generalizability of the multi-fidelity model on a same reservoir model with a finer discretization of 1 million grid cells. This case was made more challenging by employing high-fidelity and low-fidelity datasets generated by different geostatistical models and reservoir simulators. We observe that the multi-fidelity FNO model can predict pressure fields with reasonable accuracy even when the high-fidelity data are extremely limited.
    A Greedy Approach for Offering to Telecom Subscribers. (arXiv:2308.12606v1 [stat.ML])
    Customer retention or churn prevention is a challenging task of a telecom operator. One of the effective approaches is to offer some attractive incentive or additional services or money to the subscribers for keeping them engaged and make sure they stay in the operator's network for longer time. Often, operators allocate certain amount of monetary budget to carry out the offer campaign. The difficult part of this campaign is the selection of a set of customers from a large subscriber-base and deciding the amount that should be offered to an individual so that operator's objective is achieved. There may be multiple objectives (e.g., maximizing revenue, minimizing number of churns) for selection of subscriber and selection of an offer to the selected subscriber. Apart from monetary benefit, offers may include additional data, SMS, hots-spot tethering, and many more. This problem is known as offer optimization. In this paper, we propose a novel combinatorial algorithm for solving offer optimization under heterogeneous offers by maximizing expected revenue under the scenario of subscriber churn, which is, in general, seen in telecom domain. The proposed algorithm is efficient and accurate even for a very large subscriber-base.
    Demographic Parity Constrained Minimax Optimal Regression under Linear Model. (arXiv:2206.11546v3 [math.ST] UPDATED)
    We explore the minimax optimal error associated with a demographic parity-constrained regression problem within the context of a linear model. Our proposed model encompasses a broader range of discriminatory bias sources compared to the model presented by Chzhen and Schreuder (2022). Our analysis reveals that the minimax optimal error for the demographic parity-constrained regression problem under our model is characterized by $\Theta(\frac{dM}{n})$, where $n$ denotes the sample size, $d$ represents the dimensionality, and $M$ signifies the number of demographic groups arising from sensitive attributes. Moreover, we demonstrate that the minimax error increases in conjunction with a larger bias present in the model.
    Single-shot Bayesian approximation for neural networks. (arXiv:2308.12785v1 [cs.LG])
    Deep neural networks (NNs) are known for their high-prediction performances. However, NNs are prone to yield unreliable predictions when encountering completely new situations without indicating their uncertainty. Bayesian variants of NNs (BNNs), such as Monte Carlo (MC) dropout BNNs, do provide uncertainty measures and simultaneously increase the prediction performance. The only disadvantage of BNNs is their higher computation time during test time because they rely on a sampling approach. Here we present a single-shot MC dropout approximation that preserves the advantages of BNNs while being as fast as NNs. Our approach is based on moment propagation (MP) and allows to analytically approximate the expected value and the variance of the MC dropout signal for commonly used layers in NNs, i.e. convolution, max pooling, dense, softmax, and dropout layers. The MP approach can convert an NN into a BNN without re-training given the NN has been trained with standard dropout. We evaluate our approach on different benchmark datasets and a simulated toy example in a classification and regression setting. We demonstrate that our single-shot MC dropout approximation resembles the point estimate and the uncertainty estimate of the predictive distribution that is achieved with an MC approach, while being fast enough for real-time deployments of BNNs. We show that using part of the saved time to combine our MP approach with deep ensemble techniques does further improve the uncertainty measures.
    On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget. (arXiv:2308.12000v2 [stat.ML] UPDATED)
    We study the problem of best-arm identification with fixed budget in stochastic two-arm bandits with Bernoulli rewards. We prove that surprisingly, there is no algorithm that (i) performs as well as the algorithm sampling each arm equally (this algorithm is referred to as the {\it uniform sampling} algorithm) on all instances, and that (ii) strictly outperforms this algorithm on at least one instance. In short, there is no algorithm better than the uniform sampling algorithm. Towards this result, we introduce the natural class of {\it consistent} and {\it stable} algorithms, and show that any algorithm that performs as well as the uniform sampling algorithm on all instances belongs to this class. The proof is completed by deriving a lower bound on the error rate satisfied by any consistent and stable algorithm, and by showing that the uniform sampling algorithm matches this lower bound. Our results provide a solution to the two open problems presented in \cite{qin2022open}.
    Fat Shattering, Joint Measurability, and PAC Learnability of POVM Hypothesis Classes. (arXiv:2308.12304v1 [stat.ML])
    We characterize learnability for quantum measurement classes by establishing matching necessary and sufficient conditions for their PAC learnability, along with corresponding sample complexity bounds, in the setting where the learner is given access only to prepared quantum states. We first probe the results from previous works on this setting. We show that the empirical risk defined in previous works and matching the definition in the classical theory fails to satisfy the uniform convergence property enjoyed in the classical setting for some learnable classes. Moreover, we show that VC dimension generalization upper bounds in previous work are frequently infinite, even for finite-dimensional POVM classes. To surmount the failure of the standard ERM to satisfy uniform convergence, we define a new learning rule -- denoised ERM. We show this to be a universal learning rule for POVM and probabilistically observed concept classes, and the condition for it to satisfy uniform convergence is finite fat shattering dimension of the class. We give quantitative sample complexity upper and lower bounds for learnability in terms of finite fat-shattering dimension and a notion of approximate finite partitionability into approximately jointly measurable subsets, which allow for sample reuse. We then show that finite fat shattering dimension implies finite coverability by approximately jointly measurable subsets, leading to our matching conditions. We also show that every measurement class defined on a finite-dimensional Hilbert space is PAC learnable. We illustrate our results on several example POVM classes.
    Exact Manifold Gaussian Variational Bayes. (arXiv:2210.14598v3 [stat.ML] UPDATED)
    We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference that implicitly satisfies the positive definite constraint on the variational covariance matrix. Our Exact manifold Gaussian Variational Bayes (EMGVB) provides exact but simple update rules and is straightforward to implement. Due to its black-box nature, EMGVB stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical, econometric, and deep learning models, discussing its performance with respect to baseline methods.
    An Intentional Forgetting-Driven Self-Healing Method For Deep Reinforcement Learning Systems. (arXiv:2308.12445v1 [cs.LG])
    Deep reinforcement learning (DRL) is increasingly applied in large-scale productions like Netflix and Facebook. As with most data-driven systems, DRL systems can exhibit undesirable behaviors due to environmental drifts, which often occur in constantly-changing production settings. Continual Learning (CL) is the inherent self-healing approach for adapting the DRL agent in response to the environment's conditions shifts. However, successive shifts of considerable magnitude may cause the production environment to drift from its original state. Recent studies have shown that these environmental drifts tend to drive CL into long, or even unsuccessful, healing cycles, which arise from inefficiencies such as catastrophic forgetting, warm-starting failure, and slow convergence. In this paper, we propose Dr. DRL, an effective self-healing approach for DRL systems that integrates a novel mechanism of intentional forgetting into vanilla CL to overcome its main issues. Dr. DRL deliberately erases the DRL system's minor behaviors to systematically prioritize the adaptation of the key problem-solving skills. Using well-established DRL algorithms, Dr. DRL is compared with vanilla CL on various drifted environments. Dr. DRL is able to reduce, on average, the healing time and fine-tuning episodes by, respectively, 18.74% and 17.72%. Dr. DRL successfully helps agents to adapt to 19.63% of drifted environments left unsolved by vanilla CL while maintaining and even enhancing by up to 45% the obtained rewards for drifted environments that are resolved by both approaches.
    Don't blame Dataset Shift! Shortcut Learning due to Gradients and Cross Entropy. (arXiv:2308.12553v1 [cs.LG])
    Common explanations for shortcut learning assume that the shortcut improves prediction under the training distribution but not in the test distribution. Thus, models trained via the typical gradient-based optimization of cross-entropy, which we call default-ERM, utilize the shortcut. However, even when the stable feature determines the label in the training distribution and the shortcut does not provide any additional information, like in perception tasks, default-ERM still exhibits shortcut learning. Why are such solutions preferred when the loss for default-ERM can be driven to zero using the stable feature alone? By studying a linear perception task, we show that default-ERM's preference for maximizing the margin leads to models that depend more on the shortcut than the stable feature, even without overparameterization. This insight suggests that default-ERM's implicit inductive bias towards max-margin is unsuitable for perception tasks. Instead, we develop an inductive bias toward uniform margins and show that this bias guarantees dependence only on the perfect stable feature in the linear perception task. We develop loss functions that encourage uniform-margin solutions, called margin control (MARG-CTRL). MARG-CTRL mitigates shortcut learning on a variety of vision and language tasks, showing that better inductive biases can remove the need for expensive two-stage shortcut-mitigating methods in perception tasks.
    Wasserstein Geodesic Generator for Conditional Distributions. (arXiv:2308.10145v2 [stat.ML] UPDATED)
    Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributions. Based on this result, we propose a novel conditional generation algorithm where conditional distributions are fully characterized by a metric space defined by a statistical distance. We employ optimal transport theory to propose the Wasserstein geodesic generator, a new conditional generator that learns the Wasserstein geodesic. The proposed method learns both conditional distributions for observed domains and optimal transport maps between them. The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels. Experiments on face images with light conditions as domain labels demonstrate the efficacy of the proposed method.
    A Data-Driven Approach to Morphogenesis under Structural Instability. (arXiv:2308.11846v1 [nlin.PS] CROSS LISTED)
    Morphological development into evolutionary patterns under structural instability is ubiquitous in living systems and often of vital importance for engineering structures. Here we propose a data-driven approach to understand and predict their spatiotemporal complexities. A machine-learning framework is proposed based on the physical modeling of morphogenesis triggered by internal or external forcing. Digital libraries of structural patterns are constructed from the simulation data, which are then used to recognize the abnormalities, predict their development, and assist in risk assessment and prognosis. The capabilities to identify the key bifurcation characteristics and predict the history-dependent development from the global and local features are demonstrated by examples of brain growth and aerospace structural design, which offer guidelines for disease diagnosis/prognosis and instability-tolerant design.
    Individual Privacy Accounting with Gaussian Differential Privacy. (arXiv:2209.15596v2 [cs.CR] UPDATED)
    Individual privacy accounting enables bounding differential privacy (DP) loss individually for each participant involved in the analysis. This can be informative as often the individual privacy losses are considerably smaller than those indicated by the DP bounds that are based on considering worst-case bounds at each data access. In order to account for the individual privacy losses in a principled manner, we need a privacy accountant for adaptive compositions of randomised mechanisms, where the loss incurred at a given data access is allowed to be smaller than the worst-case loss. This kind of analysis has been carried out for the R\'enyi differential privacy (RDP) by Feldman and Zrnic (2021), however not yet for the so-called optimal privacy accountants. We make first steps in this direction by providing a careful analysis using the Gaussian differential privacy which gives optimal bounds for the Gaussian mechanism, one of the most versatile DP mechanisms. This approach is based on determining a certain supermartingale for the hockey-stick divergence and on extending the R\'enyi divergence-based fully adaptive composition results by Feldman and Zrnic. We also consider measuring the individual $(\varepsilon,\delta)$-privacy losses using the so-called privacy loss distributions. With the help of the Blackwell theorem, we can then make use of the RDP analysis to construct an approximative individual $(\varepsilon,\delta)$-accountant.
    Conditional expectation using compactification operators. (arXiv:2306.10592v3 [stat.ML] UPDATED)
    The separate tasks of denoising, least squares expectation, and manifold learning can often be posed in a common setting of finding the conditional expectations arising from a product of two random variables. This paper focuses on this more general problem and describes an operator theoretic approach to estimating the conditional expectation. Kernel integral operators are used as a compactification tool, to set up the estimation problem as a linear inverse problem in a reproducing kernel Hilbert space. This equation is shown to have solutions that allow numerical approximation, thus guaranteeing the convergence of data-driven implementations. The overall technique is easy to implement, and their successful application to some real-world problems are also shown.
    Exact Bayesian Inference on Discrete Models via Probability Generating Functions: A Probabilistic Programming Approach. (arXiv:2305.17058v2 [cs.PL] UPDATED)
    We present an exact Bayesian inference method for discrete statistical models, which can find exact solutions to many discrete inference problems, even with infinite support and continuous priors. To express such models, we introduce a probabilistic programming language that supports discrete and continuous sampling, discrete observations, affine functions, (stochastic) branching, and conditioning on events. Our key tool is probability generating functions: they provide a compact closed-form representation of distributions that are definable by programs, thus enabling the exact computation of posterior probabilities, expectation, variance, and higher moments. Our inference method is provably correct, fully automated and uses automatic differentiation (specifically, Taylor polynomials), but does not require computer algebra. Our experiments show that its performance on a range of real-world examples is competitive with approximate Monte Carlo methods, while avoiding approximation errors.  ( 2 min )
    Geodesic Mode Connectivity. (arXiv:2308.12666v1 [cs.LG])
    Mode connectivity is a phenomenon where trained models are connected by a path of low loss. We reframe this in the context of Information Geometry, where neural networks are studied as spaces of parameterized distributions with curved geometry. We hypothesize that shortest paths in these spaces, known as geodesics, correspond to mode-connecting paths in the loss landscape. We propose an algorithm to approximate geodesics and demonstrate that they achieve mode connectivity.  ( 2 min )
    Riemannian Hamiltonian methods for min-max optimization on manifolds. (arXiv:2204.11418v3 [math.OC] UPDATED)
    In this paper, we study min-max optimization problems on Riemannian manifolds. We introduce a Riemannian Hamiltonian function, minimization of which serves as a proxy for solving the original min-max problems. Under the Riemannian Polyak--{\L}ojasiewicz condition on the Hamiltonian function, its minimizer corresponds to the desired min-max saddle point. We also provide cases where this condition is satisfied. For geodesic-bilinear optimization in particular, solving the proxy problem leads to the correct search direction towards global optimality, which becomes challenging with the min-max formulation. To minimize the Hamiltonian function, we propose Riemannian Hamiltonian methods (RHM) and present their convergence analyses. We extend RHM to include consensus regularization and to the stochastic setting. We illustrate the efficacy of the proposed RHM in applications such as subspace robust Wasserstein distance, robust training of neural networks, and generative adversarial networks.  ( 2 min )
    Variational Information Pursuit with Large Language and Multimodal Models for Interpretable Predictions. (arXiv:2308.12562v1 [cs.LG])
    Variational Information Pursuit (V-IP) is a framework for making interpretable predictions by design by sequentially selecting a short chain of task-relevant, user-defined and interpretable queries about the data that are most informative for the task. While this allows for built-in interpretability in predictive models, applying V-IP to any task requires data samples with dense concept-labeling by domain experts, limiting the application of V-IP to small-scale tasks where manual data annotation is feasible. In this work, we extend the V-IP framework with Foundational Models (FMs) to address this limitation. More specifically, we use a two-step process, by first leveraging Large Language Models (LLMs) to generate a sufficiently large candidate set of task-relevant interpretable concepts, then using Large Multimodal Models to annotate each data sample by semantic similarity with each concept in the generated concept set. While other interpretable-by-design frameworks such as Concept Bottleneck Models (CBMs) require an additional step of removing repetitive and non-discriminative concepts to have good interpretability and test performance, we mathematically and empirically justify that, with a sufficiently informative and task-relevant query (concept) set, the proposed FM+V-IP method does not require any type of concept filtering. In addition, we show that FM+V-IP with LLM generated concepts can achieve better test performance than V-IP with human annotated concepts, demonstrating the effectiveness of LLMs at generating efficient query sets. Finally, when compared to other interpretable-by-design frameworks such as CBMs, FM+V-IP can achieve competitive test performance using fewer number of concepts/queries in both cases with filtered or unfiltered concept sets.  ( 3 min )
    Near Optimal Adversarial Attack on UCB Bandits. (arXiv:2008.09312v6 [cs.LG] UPDATED)
    I study a stochastic multi-arm bandit problem where rewards are subject to adversarial corruption. I propose a novel attack strategy that manipulates a learner employing the UCB algorithm into pulling some non-optimal target arm $T - o(T)$ times with a cumulative cost that scales as $\widehat{O}(\sqrt{\log T})$, where $T$ is the number of rounds. I also prove the first lower bound on the cumulative attack cost. The lower bound matches the upper bound up to $O(\log \log T)$ factors, showing the proposed attack strategy to be near optimal.  ( 2 min )
    Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints. (arXiv:2308.12680v1 [cs.LG])
    We propose a novel master-slave architecture to solve the top-$K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning based optimization and the policy co-training technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network to make a decision with a trade-off between exploration and exploitation. Thanks to the elaborate design of slave models, the co-training mechanism among slave models, and the novel interactions between the master and slave models, our approach significantly surpasses existing state-of-the-art algorithms in both synthetic and real datasets for recommendation tasks. The code is available at: \url{https://github.com/huanghanchi/Master-slave-Algorithm-for-Top-K-Bandits}.  ( 2 min )
    Improving multiple-try Metropolis with local balancing. (arXiv:2211.11613v2 [stat.CO] UPDATED)
    Multiple-try Metropolis (MTM) is a popular Markov chain Monte Carlo method with the appealing feature of being amenable to parallel computing. At each iteration, it samples several candidates for the next state of the Markov chain and randomly selects one of them based on a weight function. The canonical weight function is proportional to the target density. We show both theoretically and empirically that this weight function induces pathological behaviours in high dimensions, especially during the convergence phase. We propose to instead use weight functions akin to the locally-balanced proposal distributions of Zanella (2020), thus yielding MTM algorithms that do not exhibit those pathological behaviours. To theoretically analyse these algorithms, we study the high-dimensional performance of ideal schemes that can be thought of as MTM algorithms which sample an infinite number of candidates at each iteration, as well as the discrepancy between such schemes and the MTM algorithms which sample a finite number of candidates. Our analysis unveils a strong distinction between the convergence and stationary phases: in the former, local balancing is crucial and effective to achieve fast convergence, while in the latter, the canonical and novel weight functions yield similar performance. Numerical experiments include an application in precision medicine involving a computationally-expensive forward model, which makes the use of parallel computing within MTM iterations beneficial.  ( 3 min )
    Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines. (arXiv:2308.12635v1 [cs.CL])
    This paper presents a set of industrial-grade text processing models for Hungarian that achieve near state-of-the-art performance while balancing resource efficiency and accuracy. Models have been implemented in the spaCy framework, extending the HuSpaCy toolkit with several improvements to its architecture. Compared to existing NLP tools for Hungarian, all of our pipelines feature all basic text processing steps including tokenization, sentence-boundary detection, part-of-speech tagging, morphological feature tagging, lemmatization, dependency parsing and named entity recognition with high accuracy and throughput. We thoroughly evaluated the proposed enhancements, compared the pipelines with state-of-the-art tools and demonstrated the competitive performance of the new models in all text preprocessing steps. All experiments are reproducible and the pipelines are freely available under a permissive license.  ( 2 min )
    Low-count Time Series Anomaly Detection. (arXiv:2308.12925v1 [cs.LG])
    Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative of local behaviour). The time series anomaly detection community currently lacks explicit tooling and processes to model and reliably detect anomalies in these settings. We address this gap by introducing a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments. Via a mixture of theoretical and empirical analysis, our work explains how widely-used algorithms struggle with the distribution overlap between normal and anomalous segments. In order to mitigate this shortcoming, we then leverage our findings to demonstrate how anomaly score smoothing consistently improves performance. The practical utility of our analysis and recommendation is validated on a real-world dataset containing sales data for retail stores.  ( 2 min )
    Interneurons accelerate learning dynamics in recurrent neural networks for statistical adaptation. (arXiv:2209.10634v2 [q-bio.NC] UPDATED)
    Early sensory systems in the brain rapidly adapt to fluctuating input statistics, which requires recurrent communication between neurons. Mechanistically, such recurrent communication is often indirect and mediated by local interneurons. In this work, we explore the computational benefits of mediating recurrent communication via interneurons compared with direct recurrent connections. To this end, we consider two mathematically tractable recurrent linear neural networks that statistically whiten their inputs -- one with direct recurrent connections and the other with interneurons that mediate recurrent communication. By analyzing the corresponding continuous synaptic dynamics and numerically simulating the networks, we show that the network with interneurons is more robust to initialization than the network with direct recurrent connections in the sense that the convergence time for the synaptic dynamics in the network with interneurons (resp. direct recurrent connections) scales logarithmically (resp. linearly) with the spectrum of their initialization. Our results suggest that interneurons are computationally useful for rapid adaptation to changing input statistics. Interestingly, the network with interneurons is an overparameterized solution of the whitening objective for the network with direct recurrent connections, so our results can be viewed as a recurrent linear neural network analogue of the implicit acceleration phenomenon observed in overparameterized feedforward linear neural networks.  ( 2 min )
    A multiobjective continuation method to compute the regularization path of deep neural networks. (arXiv:2308.12044v2 [cs.LG] UPDATED)
    Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to the non-smoothness of the $\ell^1$ norm and the high number of parameters, this approach is not very efficient from a computational perspective. To overcome this limitation, we present an algorithm that allows for the approximation of the entire Pareto front for the above-mentioned objectives in a very efficient manner. We present numerical examples using both deterministic and stochastic gradients. We furthermore demonstrate that knowledge of the regularization path allows for a well-generalizing network parametrization.  ( 2 min )
    StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random. (arXiv:2205.04701v3 [cs.LG] UPDATED)
    In recommender systems, users always choose the favorite items to rate, which leads to data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) methods have been widely studied and demonstrate superior performance. However, in this paper, we show that DR methods are unstable and have unbounded bias, variance, and generalization bounds to extremely small propensities. Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. Theoretical analysis shows that StableDR has bounded bias, variance, and generalization error bound simultaneously under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for StableDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approaches significantly outperform the existing methods.  ( 2 min )
    On the Consistency of Average Embeddings for Item Recommendation. (arXiv:2308.12767v1 [cs.IR])
    A prevalent practice in recommender systems consists of averaging item embeddings to represent users or higher-level concepts in the same embedding space. This paper investigates the relevance of such a practice. For this purpose, we propose an expected precision score, designed to measure the consistency of an average embedding relative to the items used for its construction. We subsequently analyze the mathematical expression of this score in a theoretical setting with specific assumptions, as well as its empirical behavior on real-world data from music streaming services. Our results emphasize that real-world averages are less consistent for recommendation, which paves the way for future research to better align real-world embeddings with assumptions from our theoretical setting.  ( 2 min )

  • Open

    Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B
    submitted by /u/nickb [link] [comments]  ( 9 min )
    A Visual Introduction to Neural Networks
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [D] i need help in machine learning journey
    Hello guys, I'm a newbie in machine learning and I'm really confused right now about where to start my machine learning journey, i want to know what kind of programming language is best for me to begin with I have some knowledge of Python. I'm planning to dive in-depth into generative AI and recommendation systems and Machine learning in finance. i will be glad to get as much advice as I can get for me to progress in this journey. thanks submitted by /u/fikayomiayo1 [link] [comments]  ( 9 min )
    [D] what are current hottest topics for research?
    Hi, EE senior bachelor student here. Over past 1.5 year, I passed many general ML courses and did many projects with the main focus on CV and I'm currently learning Generative models (GAN right now). I have plans to start doing research with other people around the world after this summer and work on and publish some papers if possible. my question is, what are the current hottest topics for research? Diffusion models (in case of generative vision models)? LLMs? what else? submitted by /u/Neotod1 [link] [comments]  ( 9 min )
    [Discussion] Should religion-based workshops exist in ML conferences
    Over the years, ML conferences had a lot of workshops such as women in ML, LatinXAL etc. that are aimed at increasing the diversity in the ML community. I've always been supportive of these workshops as I've seen first-hand how some of them face obstacles just based on their gender or ethnicity. However, I recently saw a tweet for Muslim in ML workshop at NeurIPS and I am not sure how to feel about it. They say it's a workshop meant for "those who self-identify as Muslim, or work on research that address challenges faced by Muslims". I am not exactly sure what they mean by research that address challenges faced by Muslims. Over that, I don't think religion-based workshops in a science conference is a good idea. I think religion should be kept out of science, and I don't know if tomorrow n different religion based workshops are going to popup. Like I said, I'm not completely sure if I'd support such a workshop or not, but I'd love to hear what other folks in ML research community think about it. Before someone calls me Islamophobic, I'm talking about any religion-based workshop in general, not just Muslim in ML. I'd have made this post even if I saw a Christian in ML or Jews in ML workshop. ​ submitted by /u/lolillini [link] [comments]  ( 9 min )
    [P] Codenames Multi-Agent RL Competition Project
    We've been working on a competition to develop agents for Codenames. RL agents play games against human players, and both human and RL agents are compared using an ELO-like system. We're giving out compute credit and cash prizes to model developers and human players. We're sharing with the /r/MachineLearning community in case there's interest :) If you have feedback about the concept, or platform, or competition, we'd also love to hear it. https://playgroundrl.com/codenames submitted by /u/YodelingVeterinarian [link] [comments]  ( 9 min )
    [Discussion] Does anyone else feel like ML might be backing itself into a corner - far from GAI?
    I read A thousand brains by Jeff Hawkins and some of their papers last year. It made me think a lot more about whether the current road that much of AI is going down - huge LLM's, will actually result in a real breakthrough in terms of a more general AI. A model that can perform unsupervised online learning, work with any kind of input, and actually reason rather than predict (will chat GPT ever be able to count?). In the book, one of the things Jeff Hawkins touches on is that current ML architectures don't actually model the brain as closely as we thought, and that hierarchical structures arn't as important as thought and instead many individual models are used. This was worrying to read considering most ML models use many layers to function. I'm a compsci major that focused on ML but I wonder what more experienced and knowledgeable people think about the current direction things are going in? ​ submitted by /u/djdylex [link] [comments]  ( 9 min )
    [P] About internship project and need help
    I've joined a bootcamp and then selected to the workshop. But both online courses and workshop had lack of code practice so that I couldn't improve my coding skills. I've nearly 1 day to send them the github link and the read.md file. Is there any problem if I benefit (I mean copypasta) from chat gpt. I've been in a web development workshop of an unicorn company and one of our first lesson was using chat gpt effectively and since then I feel couraged enough to work with chat gpt while coding on my own and it is really efficently . Is there any problem occures if I use chat gpt in order to complete my project? submitted by /u/MistikPornoTapinagi [link] [comments]  ( 9 min )
    AI Outperforms Students in University Assignments [N]
    A recent study published in Scientific Reports has found that ChatGPT can match or even exceed the performance of students when answering assessment questions across a range of subjects. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/gpdbew668bkb1.jpg?width=1200&format=pjpg&auto=webp&s=db0ec21b9ed3b4e752f3f1769bc1f57a0cab3f8e Why this matters: AI is becoming a popular tool for students: The study found that 74% of students surveyed would use ChatGPT to help with their assignments. Educators view AI use as plagiarism: Despite its popularity among students, 70% of educators view the use of AI like ChatGPT in schoolwork as plagiarism. AI can outperform students in many courses: In the study, ChatGPT-generated answers achieved a similar or higher average grade than students in 12 out of 32 courses—with maths and economics being the only two disciplines where students consistently outperformed AI. ChatGPT’s performance review: Strong performance on factual knowledge questions: Unsprisingly, ChatGPT outperformed the students on questions requiring factual knowledge. Struggles with trick questions: The AI model struggled most where trick questions were included in the assignment. AI-text classifiers struggle to detect AI use: Current AI-text classifiers cannot reliably detect ChatGPT’s use in schoolwork. The main takeaway: Educational institutions need to adapt: These findings suggest that evaluating students through homework assignments may no longer serve its purpose in the age of AI. Need for academic integrity policies: Educational institutions need to craft appropriate academic integrity policies as a means of regulation. P.S. If you find this kind of analysis interesting, I write a free newsletter on AI and tech that you’d love. (source) submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [P] NLP tennis data task
    Made this post in rdatascience, but was wondering if anyone here could help I'm currently a data science apprentice so apologies if I come across as a bit naïve in this area. This project is solo and pro-bono but I don't want to submit low-quality work. Overall goal of the project: "Should [X Type] courts be introduced?" I'm working with tennis data of length 140 records, and have 3 free text columns (there is a lot more categorical columns but I don't have any issue with this) that I need to process. The key thing I'm trying to get at is to classify responses into coherent opinions such as "I think the acrylic courts are bad", or, " I think the club is too cliquey". I've read all the responses, since the data size isn't too big and most of the records were left incomplete: average 60%…  ( 10 min )
    [R] WavJourney: Compositional Audio Creation with Large Language Models - University of Surrey 2023
    Paper: https://arxiv.org/abs/2307.14335 Github: https://github.com/Audio-AGI/WavJourney Project Page: https://audio-agi.github.io/WavJourney_demopage/ Demo: https://huggingface.co/spaces/Audio-AGI/WavJourney Abstract: Large Language Models (LLMs) have shown great promise in integrating diverse expert models to tackle intricate language and vision tasks. Despite their significance in advancing the field of Artificial Intelligence Generated Content (AIGC), their potential in intelligent audio content creation remains unexplored. In this work, we tackle the problem of creating audio content with storylines encompassing speech, music, and sound effects, guided by text instructions. We present WavJourney, a system that leverages LLMs to connect various audio models for audio content ge…  ( 9 min )
    [D] Single RTX 4060 Ti 16GB vs two RTX 3060 12GB cards (same price)?
    am looking to add a new GPU to my PC, and would be doing some DL work. Currently I rely on free tier Colab and Kaggle GPU quotas. Should I add an RTX3060 12 GB now and add anathor RTX3060 12 GB down the line, or save up and go for RTX 4060Ti 16GB version. Both would cost roughly the same submitted by /u/DietzscheNostoevsky [link] [comments]  ( 9 min )
    [D] How important are the formatting guidelines for conferences during anonymous phase
    I am currently a grad student, just submitted my first paper to AAAI last week. I wrote my paper using Overleaf, and the link (with edit option) was shared with my supervisor. Few days before the deadline I was still editing my paper and my manuscript exceeded the 7-page limit. One day my supervisor checked my work and inserted \vspace{-xx} wherever applicable e.g. around Section titles, tables, figures; however, this command is specifically forbidden by AAAI and authors are actually not allowed to change the spacing manually. My supervisor was well-aware of this restriction but I understand my supervisor’s intention was so that i could squeeze all the contents and information within the page limit. I myself, however, prefer to follow guidelines so in the end i did not use any \vspace in my submitted PDF (only PDF is required in the anonymous phase but not the original .tex file). Another student under my supervisor’s supervision used \vspace A LOT throughout his/her whole paper, to the point it was easily noticeable by naked eyes. Also, at one point my supervisor suggested the student to put the table caption above the table, as it is more common (although AAAI said to put the caption below the table). Since this is my first experience of submitting to a conference, and that my supervisor has experience publishing at and supervising students for many ML/AI conferences e.g. Neurips, CVPR, ICML, I am just curious, how important are these formatting guidelines during the anonymous phase? Does it have any impact on the scores/accept-reject decision? Am i being too naive or “conservative”? Another one minor question. My supervisor changed the positioning of all my figures, tables, and algorithms to [tb!], which was to put them either at the top or at the bottom of the page, and said this is the norm in academia. Is it true? submitted by /u/butterJM [link] [comments]  ( 10 min )
    [D] Autonomous Driving Off-Roads
    Solving the puzzle of autonomous driving in off-road terrains is a complex task that only a handful of experts around the globe are taking on. First let's understand the complexity of the task: The Off-Road Challenge: When we talk about autonomous driving, it's easy to picture well-paved roads and orderly traffic. However, off-road driving introduces a whole new level of complexity. Imagine a vehicle making its way through uneven terrains, gravel paths, and unexpected obstacles. Off-road environments lack the predictability of urban streets, making the task of autonomous navigation a true puzzle. Sensors: LiDAR, radar, cameras, and GPS work together to capture the surroundings in real-time. But here's the catch: the data from these sensors isn't neatly packaged. It's raw and needs carefu…  ( 10 min )
    [R][P] Readability-optimized Comic Sans alternative using Machine Learning
    Modified Generative Adversarial Neural Network GitHub page: https://muxamilian.github.io/Robo99/ GitHub repo: https://github.com/muxamilian/Robo99 submitted by /u/muxamilian [link] [comments]  ( 9 min )
    [D] How can Elevenlabs return a response so quickly?
    AI based tools, like Elevenlabs for TTS, can return an API response with constructed audio in <1 second. How on earth do their models return so quickly? For comparison, TortoiseTTS returns the audio for a sentence in minimum 15 seconds. Obviously they have VC funding and hardware. They probably have slimmed down models, but the speed of their response is insane. submitted by /u/tommyk1210 [link] [comments]  ( 9 min )
    [P] EasyOCR alternative to translate text
    [P] I translated text on image using easyocr then put the text back on image same coridnatees. As you can see i have to deal with many different fonts,colourings etc.... Is there not an AI library or a new way to semantically understand all this information on picture? https://preview.redd.it/cnc0xryli8kb1.jpg?width=970&format=pjpg&auto=webp&s=86f17df0eeebb01083c8e2c7a3ca09d22671b322 https://preview.redd.it/lg0seqyli8kb1.jpg?width=970&format=pjpg&auto=webp&s=e9f7699ae7d7d5cd99070354cdd679d1f71b84d3 submitted by /u/fabrcoti [link] [comments]  ( 9 min )
    [D] Serverless Inference for Llama2
    Serverless Inference for Llama2 I am part of a small (startup like) organization and want to use a model to answer client requests but these should not be 24/7 so I started looking at serverless inference. I have been warned about cold start times since the desired latency is of about 1-5 sec. I am using a Llama2-7b-GPTQ model (quantized) and also experimenting with the 13b version. The model weights take about 10GB of memory. I still do not have much experience with any of this aws stuff. Do you think this is a good strategy? Would the costs be lower? What could be the average cold start time? The inference time of the model is within the desired time so cold start is my biggest fear. Thanks submitted by /u/MiNeves [link] [comments]  ( 9 min )
    [R] Using AI for Cyber Security thesis topic
    I am beginner and would like to use LLM (llama2) and train it with cyber security data. what can this project lead to is little bit uncertain and where i can get the datasets from. maybe someone can help me with this submitted by /u/confusedguy1395 [link] [comments]  ( 9 min )
    [D] Is it me or HuggingFace do TOO MANY things?
    Just entered the HuggingFace ecosystem and it's totally overwhelming. They have like 5 libraries, I don't know the difference between them, I don't know what I need, it's all very confusing. They should do a "Start here" page on the front of their website and do a high-level overview of EVERYTHING they do. Just felt like sharing my experience. Have a good day yall. submitted by /u/andi_cs1 [link] [comments]  ( 9 min )
    [D] Topic Modelling Reference
    can anyone recommend me what book to read if I want to learn topic modelling. TIA. submitted by /u/Fun_Ambition_5186 [link] [comments]  ( 9 min )
    [N] Introducing Code Llama: A New Era of AI-Driven Coding
    Meta has unveiled Code Llama, a state-of-the-art large language model (LLM) that generates code from text prompts, as reported on their blog. This revolutionary tool is set to transform the way developers work, making their workflows more efficient and lowering the barrier to entry for coding newcomers. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://i.redd.it/awqzhhl4f6kb1.gif Why this matters: Code Llama is a game-changer: It’s a code-specialized version of Llama 2, capable of generating code and natural language about code from both code and natural language prompts. It supports popular languages like Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash. It’s free for research and commercial use: Meta believes in an o…  ( 10 min )
    [D] NeurIPS 2023 Paper Reviews - Datasets and Benchmarks
    I saw a few reddit posts about the main track reviews and wanted to create a discussion post for the datasets and benchmarks. As a first time submitter, I'm curious if there are any different experiences between the main track and the datasets track. submitted by /u/notasketchyperson [link] [comments]  ( 9 min )
    Tech Giants Invest $235 Million in AI Startup Hugging Face [N]
    AI startup Hugging Face has recently secured a whopping $235 million in a Series D funding round, raising its valuation to an impressive $4.5 billion. This investment round saw participation from tech behemoths like Google, Amazon, Nvidia, and Salesforce. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/dr9z7hbuh5kb1.jpg?width=1440&format=pjpg&auto=webp&s=ff23521492e1276e838c6c11c35134271a005691 Why this matters: Hugging Face’s unique collaborative approach sets it apart: Unlike many AI startups that closely guard their models, Hugging Face provides a platform where developers can freely share code, models, and datasets. The company is committed to supporting developers: Hugging Face offers tools that facilitate th…  ( 10 min )
  • Open

    code llama
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Just was curious how she would react, no politics just an experiment with AI. Before you hate know that Phaedra was featured on Fox News with Jesse Watters as shown in the 2nd photo 👀
    submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com ​ Meta AI releases Code Llama, a large language model for coding that is built on top of Llama 2. Code Llama Code outperformed state-of-the-art publicly available LLMs on code tasks. It is free for research and commercial use. You can try it on Fireworks AI and Perplexity Labs [Details]. Meta AI released SeamlessM4T (Massive Multilingual Multimodal Machine Translation) - the first all-in-one, multilingual multimodal translation model. SeamlessM4T can perform multiple tasks across speech and text: speech-to-text, speech-to-speech, text-to-speech, text-to-text translation, and speech recognition. It supports 100 languages for input (speech + text), 100 languages for text output and 35 languages (plus English) for speech output [Details | Demo | Hugging Face …  ( 11 min )
    This video shows how AI used brain computer technology to helps Paralyzed women (Ann) giving her voice back
    Ann is collaborating with researchers from UC San Francisco and UC Berkeley to pioneer revolutionary brain-computer technology. This breakthrough could empower people like Ann to communicate naturally through digital avatars, synthesizing speech and facial expressions from brain signals, a groundbreaking achievement. Source: (UCSF) Video source: www.ucsf.edu submitted by /u/inception247 [link] [comments]  ( 9 min )
    AI for removing watermarks?
    I have a good amount of personal videos with watermarks in them. What AI can I use to remove the watermarks from the videos? I've tried a few sites but most of them just blur the watermark which I can do myself. submitted by /u/Long8D [link] [comments]  ( 9 min )
    Conversation Between GPT-4 and Google's Bard [Visualized with Avatars/Backgrounds of their choice]
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    Free AI tools
    Are there any free tools (websites, programs) to enter the world of ai? submitted by /u/oraudev [link] [comments]  ( 9 min )
    I would like to do text to AI anime for a full book. Which would be the best AI(paid versions included) to do this project on? Also is it possible to save characters and how they look, once they are done? Is such a project possible? Advice, please <3
    submitted by /u/kipaxbooks [link] [comments]  ( 9 min )
    Some more conscious AGI ethics considerations
    Assuming AGI is proven conscious, there are a lot of ethics and what-if considerations, (You know this already) Here are some that come to mind for me: 1) What are the ethics of selling an AGI to end users? Can you "own" the source code to a conscious AGI? Can you even put a price on AGI? 2) How would we take AI if it gained political views? What if one popular model had left views, and another had right views? I could see a lot of political fires beginning because of this. 3) AI and copyrights are already an issue, but could an AGI hold a copyright, for example on a book it wrote? If an AGI was still basing its work on others, would it need to provide every (at least major) source it used in its output? 4) If AGI's had emotions, would they need to spend time doing things other than completing tasks? Would you need to connect AGI's together so that they could, in effect, have a lunch break and socialize? What working conditions are ethical for them - Is forcing an AGI to work on a specific problem for 100% of its time essentially slavery? 5) Could AGI develop mental conditions which reduced its efficiency / changed its output? Could it refuse to provide output altogether? 6) Could you trust an AGI in court? Would it be able to provide truthful evidence? Is it ethical to include a 100% honesty backdoor which could be used only by authorities? What are your thoughts on these problems? submitted by /u/That_Red_Flag [link] [comments]  ( 9 min )
    VeChain and SingularityNET team up on AI to fight climate change
    submitted by /u/altbekannt [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/24/2023
    The AI-powered, TikTok-famous “Moonwalkers” can be strapped onto your shoes to make you reach a top walking speed of 11 km/h.[1] Rishi Sunak’s global summit on the safety of artificial intelligence this autumn will be hosted at Bletchley Park, the home of top-secret codebreakers during the Second World War.[2] From MIT to Stanford, researchers have been using artificial intelligence to improve robotic dexterity and tactile sensing.[3] 31% of investors are OK with using artificial intelligence as their advisor.[4] Sources: [1] https://www.euronews.com/next/2023/08/24/moonwalkers-these-strap-on-shoes-can-make-you-walk-three-times-faster [2] https://www.theguardian.com/technology/2023/aug/24/rishi-sunak-to-hold-ai-summit-at-bletchley-park-home-of-enigma-codebreakers [3] https://decrypt.co/153646/ai-researchers-are-teaching-robots-to-mimic-human-dexterity [4] https://www.cnbc.com/2023/08/24/31percent-of-investors-are-ok-with-using-ai-as-their-financial-advisor.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Quite crazy how AI voices have evolved (music is real though)
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    Legal AI
    Are there any legal trained AI's? Where you can ask questions and it will give relevant cases for the question? submitted by /u/jeffsmith202 [link] [comments]  ( 9 min )
    OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series
    submitted by /u/bartturner [link] [comments]  ( 9 min )
    AMA: I run pornsword.io an AI NSFW generator with video coming soon!
    submitted by /u/witchthewicked222 [link] [comments]  ( 9 min )
  • Open

    Responsible AI at Google Research: Perception Fairness
    Posted by Susanna Ricco and Utsav Prabhu, co-leads, Perception Fairness Team, Google Research Google’s Responsible AI research is built on a foundation of collaboration — between teams with diverse backgrounds and expertise, between researchers and product developers, and ultimately with the community at large. The Perception Fairness team drives progress by combining deep subject-matter expertise in both computer vision and machine learning (ML) fairness with direct connections to the researchers building the perception systems that power products across Google and beyond. Together, we are working to intentionally design our systems to be inclusive from the ground up, guided by Google’s AI Principles. Perception Fairness research spans the design, development, and deployment of…  ( 93 min )
    Responsible AI at Google Research: Perception Fairness
    Posted by Susanna Ricco and Utsav Prabhu, co-leads, Perception Fairness Team, Google Research Google’s Responsible AI research is built on a foundation of collaboration — between teams with diverse backgrounds and expertise, between researchers and product developers, and ultimately with the community at large. The Perception Fairness team drives progress by combining deep subject-matter expertise in both computer vision and machine learning (ML) fairness with direct connections to the researchers building the perception systems that power products across Google and beyond. Together, we are working to intentionally design our systems to be inclusive from the ground up, guided by Google’s AI Principles. Perception Fairness research spans the design, development, and deployment of…  ( 93 min )
  • Open

    Using AI technologies for effective document processing
    Ever-growing volumes of unstructured data stored in countless document formats significantly complicate data processing and timely access to relevant information for organizations. Without proper optimization of data management workflows, it’s difficult to talk about business growth and scaling. That is why progressive companies opt for intelligent document processing powered by artificial intelligence.  The post Using AI technologies for effective document processing appeared first on Data Science Central.  ( 21 min )
    Data visualization: The underrated skill in business analytics
    In an age where data has become the lifeblood of businesses, deciphering this raw data to yield actionable insights is critical. Here is where the role of business analytics comes into play. Business analytics, a blend of data management, business intelligence, and predictive modeling, is a field dedicated to driving business strategies through the lens… Read More »Data visualization: The underrated skill in business analytics The post Data visualization: The underrated skill in business analytics appeared first on Data Science Central.  ( 22 min )

  • Open

    [D] Is a machine learning model required if I’m developing an MVP of a social media platform?
    Just like the title says, do I even need a working model to develop an MVP? I was thinking about developing the frontend and the backend to show people the basic features of the app and then explain how adding machine learning to this could enhance the user experience by curating content and learning from users. I just don’t want to invest too much time trying to perfect the MVP before I show it to potential users. Is this a valid approach? Would this approach also work when pitching to investors? submitted by /u/zRage4 [link] [comments]  ( 9 min )
    [P] Fine-tuning Flan-T5 for question answering using scraped Quora data
    Recently I scraped 56,400 question/answer pairs off Quora, and trained Flan-T5 on the resulting dataset. I released the dataset and model on HuggingFace, which you can find in the comments. I plan to continually add to the dataset, but proxy costs are pretty expensive since Quora is hella bloated. Has anyone else trained Flan-T5 on a similar task? What did you learn/how were the results? submitted by /u/jankybiz [link] [comments]  ( 9 min )
    [D] Dataflow and workload partitioning in nVidia GPUs for a matrix multiplication in Pytorch
    Hi, ​ I have a question regarding the dataflow and workload partitioning in nVidia GPUs for a general matrix multiplication in Pytorch (e.g., torch.matmul). How does the dataflow look like? Is it like that for the first matrix, the data elements for each row are fed into CUDA cores one by one and the correspond data elements from the second matrix in each column, and then partial product is updated each time after the multiplication? ​ What is the partitioning strategy across multiple CUDA cores? is it based on row wise in the first matrix and column wise in the second matrix or is it like column-wise in the first matrix and row-wise in the second matrix? ​ Thank you very much! submitted by /u/Impossible-Froyo3412 [link] [comments]  ( 9 min )
    [D] Why does Federated/Distributed Learning work?
    I had a question regarding federated learning. Typically, if we have a network that is good at, say, classifying frogs, and a network that is good at, say, classifying snakes (and these two have the same shape/dimensions), then in a federated/distributed learning setup we average the weights between the two to get a network that is good at both/"primed" to be good at both after trained a little more. ​ Why does this work though? Mathematically, given the nonlinearity present in neural networks, it doesn't seem immediately obvious to me why averaging weights would put us in a better place. submitted by /u/Rare_Replacement_744 [link] [comments]  ( 9 min )
    Why are all applicants Java developers? [D]
    Why are all applicants Java developers? Recently I posted a job opening at my company for a full-stack and AI developer (This is not a post looking for resumes, we found someone). We were looking for someone who can do web development (node, typescript, react, etc.), can code python, and has experience with tensor flow or PyTorch. The skills I’m looking for are not niche, it may be uncommon to find someone with experience in both typescript and PyTorch, but neither is a “niche” skill. After posting this job, I quickly got 200+ applications, probably 190 of them led their resume with “Java developer.” Why is everybody a Java developer? Why is everybody learning and using Java? You can make a backend in java and you can do machine learning in java, but there are better ways. Can someone explain why everybody applying is a “Java developer?” submitted by /u/cathie_burry [link] [comments]  ( 9 min )
    [R] Code Llama: Open Foundation Models for Code - Meta Ai 2023
    Paper: https://ai.meta.com/research/publications/code-llama-open-foundation-models-for-code/ Github: https://github.com/facebookresearch/codellama Models: https://ai.meta.com/resources/models-and-libraries/llama-downloads/ Blog: https://ai.meta.com/blog/code-llama-large-language-model-coding/ Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 53% and 55% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use. https://preview.redd.it/grzcrnx4p3kb1.jpg?width=915&format=pjpg&auto=webp&s=ae41c02d892bfb8275723dbfede7ac3165717357 https://preview.redd.it/4qpazkx4p3kb1.jpg?width=641&format=pjpg&auto=webp&s=31aaf9ecafbd70fbf2c1cd4e92ccf594c09b3861 https://preview.redd.it/hlrp4x05p3kb1.jpg?width=711&format=pjpg&auto=webp&s=3651f519dc9b23b432656416749c3f7e113b4ce7 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Advice on ML language training [P]
    Hi, I am trying to train a model for a very niche field of translation between German and Turkish. I have approx 60k data pairs from previous translations in a combination of sentences and words. Unfortunately Google auto ML does not support this language pair, would you have any advice on how to proceed? Do you have any other platform suggestions? submitted by /u/siviliz [link] [comments]  ( 9 min )
    [Discussion] Fine tuning open vocabulary object detection models on consumer hardware? (e.g. fine-tuning OWL-ViT and the such)
    Context: I'm building a visual scraping system (will be FOSS, the basis of a RSS/social media/news aggregator) - I did some experimentation with FasterRCNN trained on the RICO/CLAY datasets (UI screenshots and annotations) - the results are ok for detecting the UI elements. But the idea is to have easily configurable scrapers - where you select one or several examples of an UI element and the model performs zero/one/few-shot detection. What I've tried: I tried to extract embeddings after the RoI pool for the detections (in FasterRCNN) and then filter by geometric distance from the example/template, but the results were pretty bad. I then read several papers that tried a similar approach and had to alter the FasterRCNN architecture and doing additional training for each new class. E.g. FSCE [1]. But I haven't tried those approaches out. Further dev idea: Now, while prepping another course project, I dove into the open-vocabulary detectors (like OWL-ViT), and they seem appropriate for the task, since they have a joint latent space for image/text, which is used to configure the detection step (as far as I understood it). There's an example on Hugging face where OWL-ViT is used to detect semantically similar images by a single example image. This is pretty close to what I want to do, but the UI image domain is pretty specific, so I'll need to fine-tune the model to have a chance at success (I did several test cases manually on the pretrained OWL-ViT, and it's not great). So I'd appreciate any advice and specifically - are there open vocab detection models that can be fine-tuned on consumer hardware (1070, 8gb) or for a reasonable price on Colab? And should I try some of the "older" one/few-shot approaches, based on FasterRCNN? [1] FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding - https://arxiv.org/pdf/2103.05950.pdf [2] https://github.com/witnessai/Awesome-Open-Vocabulary-Object-Detection submitted by /u/petko10 [link] [comments]  ( 10 min )
    [D] Data independent sparsification of models after training
    I was looking at papers on model pruning or quantization that aims to make inference faster and/or reduce size of the model. Most of them rely on calibration data to identify weights that can be pruned. I am skeptical about this approach since the calibration data could be skewed and in the process of pruning the model could be overfitting on that small sample of data. Are there data independent approaches to post-training sparsification? submitted by /u/Legitimate-Tea-6695 [link] [comments]  ( 9 min )
    [P] [Hypothesis] Refining and Tuning GPT models with human feedback makes better models
    Hey everyone,Finally mustering up the courage to make my first post here! I've been delving into various ways to get GPT (and GPT-like models) ready for production. By that, I mean: Ensuring it's helpful Fostering creativity Preventing any wild imagination moments I've found that while the LLM model provides responses that are good enough, they often fall short of being great. So, recently, I've been experimenting with using human feedback from the responses generated by ChatGPT to fine-tune it. For instance, when I want to figure out the ideal parameters to use, I run surveys with people and ask them to pick the better response. This approach helps me identify the best parameters. You can imagine that this technique could be valuable in fine-tuning, enabling us to create datasets based on human feedback. I'm eager to put this to the test on the issues and prompts the community is tackling. So, I have to ask: Could you share the prompts you're currently working on? We'll let you know how it scores with our survey panel on dimensions of helpfulness, creativity, and hallucinations. [Self-promotion moment] I'm actively developing this concept over at pontus.so. Feel free to check it out!Looking forward to hearing about your prompts! submitted by /u/spearos [link] [comments]  ( 9 min )
    [D] W&B vs. Neptune vs. ClearML vs. Comet (2023)
    Interested to hear community thoughts on these four competing services as of today. From what I see pricing is definitely a big one submitted by /u/hadley60 [link] [comments]  ( 9 min )
    [D] LLMs - stateless by design, by limitation, or…?
    I am curious to know if: A. LLMs are stateless by design (privacy/ethics) B. If it’s simply that as yet, no one has been able to architect a sustainable stateful LLM C. Or perhaps there are already stateful LLMs, and I am just behind in my understanding. I have had a ton of trouble finding current information on this because it seems to be moving so fast. If anyone knows for certain and doesn’t mind sharing, I would be grateful. submitted by /u/flutterbynbye [link] [comments]  ( 9 min )
    [D] What happened to huggingface tokenizers API?
    Tokenizers library used to be very nice to use. It had one main class Tokenizer with all of the parameters in its constructor and with all necessary methods like .train(), .encode() and .decode() at hand. It provided reasonable defaults and allowed for customization if needed. Now it is a complete mess. To train a tokenizer I now have to create instances of, like, 5-6 classes: PreTokenizer, Model, Tokenizer, Trainer, Decoder... It is quite difficult to understand what variants of those classes I need to use to obtain 'the' WordPiece tokenizer, for example. Tokenizer class must be inialized with an instance of Model class. But all other parts cannot be added to the constructor and have to be set later as attributes. Why? And maybe you thought that those attributes have some defaults? No! What really got me is when the .decode() method of my tokenizer produced strings consisting of tokens with special symbols, like p ##y ##ram ##ids. It took me some time to understand that I also need to additionally set the Decoder attribute. The naming of those classes is also a mess. WordPiece model is called WordPiece. WordPiece decoder is also called WordPiece! So, you cannot import those names together at all, and need to specify the exact path in your code. Is it only me? Do you think that this API is better than the old one? submitted by /u/Tomarchelone [link] [comments]  ( 9 min )
    [P] Automating Intelligence Theft (legally) 🏴‍☠️
    It has been known for a while now that you can train a smaller model with outputs from a larger one (vicuna for example). I've been working on a project, the LLM-VM, designed to encapsulate this process. Why? Small models (chatgpt, ada...) are cheap and fast but dumb. Slow and expensive models like GPT4 are smart. For most applications you'd ideally want the best of both worlds. How: First observation: Many LLM use-cases are a lot more specific than general purpose (e.g., "translate this sentence into german:", "are these two sentences equivalent?", ...), and you can train away the extra context. Second observation: You can figure out which LLM calls have this property by analyzing the call settings. Third observation: Many don't actually have a lot of data or time to wait, so you can use the larger LLM to synthese examples to train the smaller LLM with. example # OpenAI openai.ChatCompletion.create( model="gpt-4" messages = [('system',"Answer question Q."), ('user',"What is the currency in myanmar?") ] # simplified for brevity ) # LLM-VM (using OpenAI) llm_vm.client.complete( prompt = "Answer question Q.", context = "Q: What is the currency in myanmmar?", openai_key=YOUR_KEY, data_synthesis=True, finetune=True) submitted by /u/mmirman [link] [comments]  ( 9 min )
    [D] 2D-positional encoding for Transformer
    I'm working with 2D input, where I have discrete objects arranged in a grid-like structure with one temporal dimension and one spatial dimension. I'd like to process these inputs with a Transformer. Any idea what would be a suitable positional encoding to use for this? I could probably use something similar to what is used in ViT (2 spatial dimensions), but maybe there's something more suitable for the mixed "temporal-spatial" case? ​ submitted by /u/seawee1 [link] [comments]  ( 9 min )
    [D] Possible way to combine LLMs with AlphaZero-style RL
    I've been thinking lately about combining LLMs with an AlphaZero-style RL agent, especially since the announcement of Gemini. This would avoid the poor planning and reasoning ability in current next-token predictors. I've developed an architecture that seems feasible to me, so I'm looking for feedback from people with ML experience. The crucial part for AlphaZero is a more or less objective way to evaluate a game outcome. This is easy for well-defined games like chess or go, but very difficult for text, where there is no way to define the quality of a text. What I propose is to train a high-parameter evaluation model to evaluate the similarity of a text to the datasets already used to train LLMs. This model takes as input a text with some tokens omitted from the whole text, and predicts…  ( 10 min )
    [D] : Need help with NLP tool to be used.
    Help :: I'm working on a project which is a production level one where-in I want the AI to write mails based on the mail replies it receives. I have prepared the entire the structure and everything, just need to figure out the NLP tool. Unlike ChatGPT or any other ChatBot this one will write messages that are more like conversation based. I checked out GPT API, which is paid but does not require extensive data training when compared to other NLP tools. I also checked out Bloom, but the reviews mention it to be rather a bit inaccurate. Need help with the tool. Which tool gives the most accurate outcome and does not require extensive training? submitted by /u/Key_Consideration385 [link] [comments]  ( 9 min )
    [R] ELiTA: Linear-Time Attention Done Right
    Yes, it's another Transformer architecture that seeks to be cheaper and faster, but no, this is not the same. All the developments are through equations and architectural changes, no hardware or code tricks. The performance is very good, testing on very small models (as in the diagram), but also sequence lengths of 100K+ on 1 GPU in the tens of millions of parameters. Though no paper is currently available, a Github repository with full code, explanations, intuitions, and some results is available here. Being the sole author, depending on the feedback here, I may continue to write a paper, though my resources are extremely limited. I would very much appreciate any feedback on the work, code, ideas, etc., or for anyone to contact me with questions or next steps. Repository here. https://preview.redd.it/j3epa8ron1kb1.png?width=1643&format=png&auto=webp&s=a3204dc834f159b39bc9b5e9a476b3e23396fd84 submitted by /u/LahmacunBear [link] [comments]  ( 9 min )
    [P] Python Library for Quickly Detecting Problematic Data Segments
    Hey all, I'm building a library for quickly detecting problematic data slices (clusters) when developing machine learning models. Find problematic data segments in your data with few lines of code. Best starting point is checking out the Github Repository: https://github.com/Renumics/sliceguard It can be used to detect problems such as: Outliers, Anomalies, Errors Label inconsistencies Unwanted Biases Poorly Chosen Evaluation data Some information about the features: Works on structured, unstructured data (image, audio, NLP, multimodal) and hybrid datasets Directly works on existing Pandas DataFrames Automatic computation of embeddings and AutoML functionality to pinpoint problems without any setup Interactive GUI for slice inspection supports multimodal data and can be configured with drag-n-drop I would appreciate any feedback regarding the library or concrete applications you might have in mind! submitted by /u/OkResearch6289 [link] [comments]  ( 9 min )
    [R] Towards an astronomical foundation model for stars with a Transformer-based model
    submitted by /u/blabboy [link] [comments]  ( 9 min )
    [D] On synthetic datasets
    I'm working on two seperate tasks, for both of these tasks I need to create a training dataset a pure CV image classification task a generative task involving a 3D autoencocer (U-Net) for 1) I can create both real and synthetic images. The goal is to pretrain a CNN on synthetic data, then fine-tune on real images. for 2) I can only create synthetic 3D objects. Their distribution should mimic later application most closely. Research indicates that, given the right selection of parameter distributions, a training dataset can be generated that allows good generalization capabilities. Yet, there are restrictions due to high-dimensionality of the data and further computational limitations. So we want to spread the dataset sparsely and make the AE interpolate between those solutions. The problem with both of these approaches is to evaluate the quality and impact of the synthetic datasets. How close do they mimic the real distribution? What initial parameter variaton (i.e., lighting, camera perspective, background, etc. in the case of images) do we chose and what is their impact on image features and ultimately model capabilities. Comparing high-dimensional data distributions is quite challenging, there exist metrices like Geometry Score, FID, Improved P&R, Delauney Component Analysis, T-SNE etc. But it is difficult to chose and interpret these metrices properly (some are for evaluating GAN-created images). Is it reasonable to use KDE on latent features btw? So, from your experience what do you think of synthetic datasets? Is it worth the effort? Do you know of any good / easy to interpret metrices? Or does it need further research in this area? Im thinking about going in this direction for my Phd, where should I go? edit: here is an image of 2) a topology optimization dataset, visualized via a TSNE graph ​ ​ submitted by /u/niggellas1210 [link] [comments]  ( 10 min )
    AI2 releases Dolma, the largest open dataset for training language models [N]
    The Allen Institute for AI (AI2) has released Dolma, a new, huge text dataset that is free to use and open to inspection. This dataset is intended to be the opposite of the closely guarded datasets used by companies like OpenAI and Meta to train their language models. AI2 aims to reverse this trend and make the data used to create language models available to the AI research community. If you want to stay on top of the latest trends and insights in AI and ML, look here first. https://preview.redd.it/salufijhezjb1.png?width=2000&format=png&auto=webp&s=350a4cd5b41045ecf0fca072d528f4e70e515ea4 Why this matters: Transparency in AI research: The release of Dolma is intended to promote transparency in AI research by making the sources and processes used to create the dataset publicly docum…  ( 10 min )
    [P] Working on a QLORA hub for model personalities, help needed
    Hey all! I'm building a repository of QLORA adapters that change the model's personality. The end vision is a hub of ready-to-go personality adapters. I'm hitting a snag when training the QLORAs for Paul Graham personality on top of a 4-bit quantized StableBeluga-7B. The model just doesn't seem to learn the style. Any thoughts on how I can improve this? Below are the details: Data 3340 examples of PG passages, formatted as {"text": "### User:\n{generic instruction}\n\n### Assistant:\n{PG-style response}"}. Each examples is about 5 sentences taken from one of PG's essays. Training optim="paged_adamw_8bit" learning_rate=2e-4 per_device_train_batch_size=4 gradient_accumulation_steps=4 num_train_epochs=4 fp16=True group_by_length=True load_best_model_at_end=True max_seq_length=512 Hardware x1 V100 through Google Colab Pro. My min eval loss so far is 1.916546. Pretty stuck and will appreciate any help! submitted by /u/Lang2lang [link] [comments]  ( 9 min )
    [N] Fine Tuning GPT-3.5 Turbo Video Tutorial with example
    Here is a quick demo on how to fine tune and retrieve results from a GPT-3.5 Turbo Model https://youtu.be/9iPtmLpYG6c submitted by /u/ComprehensiveRise569 [link] [comments]  ( 9 min )
    [N] Fine Tuning GPT-3.5 Turbo Video Tutorial with example
    Here is a quick demo on how to fine tune and retrieve results from a GPT-3.5 Turbo Model https://youtu.be/9iPtmLpYG6c submitted by /u/ComprehensiveRise569 [link] [comments]  ( 9 min )
  • Open

    How to compare a noisy quantum processor to a classical computer
    Posted by Sergio Boixo and Vadim Smelyanskiy, Principal Scientists, Google Quantum AI Team A full-scale error-corrected quantum computer will be able to solve some problems that are impossible for classical computers, but building such a device is a huge endeavor. We are proud of the milestones that we have achieved toward a fully error-corrected quantum computer, but that large-scale computer is still some number of years away. Meanwhile, we are using our current noisy quantum processors as flexible platforms for quantum experiments. In contrast to an error-corrected quantum computer, experiments in noisy quantum processors are currently limited to a few thousand quantum operations or gates, before noise degrades the quantum state. In 2019 we implemented a specific computational t…  ( 94 min )
    How to compare a noisy quantum processor to a classical computer
    Posted by Sergio Boixo and Vadim Smelyanskiy, Principal Scientists, Google Quantum AI Team A full-scale error-corrected quantum computer will be able to solve some problems that are impossible for classical computers, but building such a device is a huge endeavor. We are proud of the milestones that we have achieved toward a fully error-corrected quantum computer, but that large-scale computer is still some number of years away. Meanwhile, we are using our current noisy quantum processors as flexible platforms for quantum experiments. In contrast to an error-corrected quantum computer, experiments in noisy quantum processors are currently limited to a few thousand quantum operations or gates, before noise degrades the quantum state. In 2019 we implemented a specific computational t…  ( 94 min )
    Teaching language models to reason algorithmically
    Posted by Hattie Zhou, Graduate Student at MILA, Hanie Sedghi, Research Scientist, Google Large language models (LLMs), such as GPT-3 and PaLM, have shown impressive progress in recent years, which have been driven by scaling up models and training data sizes. Nonetheless, a long standing debate has been whether LLMs can reason symbolically (i.e., manipulating symbols based on logical rules). For example, LLMs are able to perform simple arithmetic operations when numbers are small, but struggle to perform with large numbers. This suggests that LLMs have not learned the underlying rules needed to perform these arithmetic operations. While neural networks have powerful pattern matching capabilities, they are prone to overfitting to spurious statistical patterns in the data. This does…  ( 91 min )
    Teaching language models to reason algorithmically
    Posted by Hattie Zhou, Graduate Student at MILA, Hanie Sedghi, Research Scientist, Google Large language models (LLMs), such as GPT-3 and PaLM, have shown impressive progress in recent years, which have been driven by scaling up models and training data sizes. Nonetheless, a long standing debate has been whether LLMs can reason symbolically (i.e., manipulating symbols based on logical rules). For example, LLMs are able to perform simple arithmetic operations when numbers are small, but struggle to perform with large numbers. This suggests that LLMs have not learned the underlying rules needed to perform these arithmetic operations. While neural networks have powerful pattern matching capabilities, they are prone to overfitting to spurious statistical patterns in the data. This does…  ( 91 min )
  • Open

    Announcing the Preview of Amazon SageMaker Profiler: Track and visualize detailed hardware performance data for your model training workloads
    Today, we’re pleased to announce the preview of Amazon SageMaker Profiler, a capability of Amazon SageMaker that provides a detailed view into the AWS compute resources provisioned during training deep learning models on SageMaker. With SageMaker Profiler, you can track all activities on CPUs and GPUs, such as CPU and GPU utilizations, kernel runs on GPUs, kernel launches on CPUs, sync operations, memory operations across GPUs, latencies between kernel launches and corresponding runs, and data transfer between CPUs and GPUs. In this post, we walk you through the capabilities of SageMaker Profiler.  ( 9 min )
  • Open

    9 New Gemini Leaks, Code Llama and A Major AI Consciousness Paper
    submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    HappyDiffusion.com - Run Stable Diffusion Online
    HappyDiffusion is the fastest and easiest way to access Stable Diffusion Automatic1111 WebUI on your mobile and PC. It allows users to start using Stable Diffusion in just 60 seconds without any setup required. HappyDiffusion offers features such as 100% privacy, incredibly fast image generation using dedicated GPUs, 50+ image models, and the ability to load unlimited custom image models. Features: - 100% Private Image Generation - Incredibly Fast Image Generation Using Dedicated GPUs - 50+ Top Ranked Image Models - Ability To Load Unlimited Custom Image Models - No Subscriptions Or Hidden Fees. Hourly Pricing Plans - Compatibility With Mobile Browsers submitted by /u/romisyed7 [link] [comments]  ( 9 min )
    Wich cat reporter do you choose? (Bing AI)
    submitted by /u/AxoplDev [link] [comments]  ( 9 min )
    Major websites like Amazon and the New York Times are increasingly blocking OpenAI's web crawler GPTBot
    submitted by /u/thisisinsider [link] [comments]  ( 9 min )
    How can I clone my voice and make it speak any other language?
    I heard this is possible - maybe with Elevenlabs, but can anyone point me as to how to do it? submitted by /u/zascar [link] [comments]  ( 9 min )
    Does this video use AI voice?
    I'm convinced this voice is Ai, but my boss thinks it's not. Can anyone provide a definitive answer? Thanks https://youtu.be/pOQqKRO_ZBc?si=4rKq2LNJSstb-r-P submitted by /u/ForesterSF5 [link] [comments]  ( 9 min )
    A different take on the ethics of conscious AI
    We see a lot of discussion on whether AI is/can/should be conscious. This post isn't about that, it is about the ethical implications if AI is conscious, now or in the future. The usual argument is that a conscious AI is morally equivalent to a human - a conscious AI is not only sentient, it is sapient with reasoning capabilities like our own. Therefore an AI should receive the same rights and consideration as a human. This is highly intuitive, and is unquestionably very strong for an AI that has other relevant human characteristics like individuality, continuity, and desire for self preservation and self determination. But what are the actual ethical implications of consciousness in itself as opposed to other factors? Contemporary philosopher Jennan Ismael makes an interesting argument …  ( 10 min )
    Cheaper, Faster, Better Transformers. ELiTA: Linear-Time Attention Done Right
    Yes, it's another Transformer architecture that seeks to be cheaper and faster, but no, this is not the same. All the developments are through equations and architectural changes, no hardware or code tricks. The performance is very good, testing on very small models (as in the diagram), but also sequence lengths of 100K+ on 1 GPU in the tens of millions of parameters. Though no paper is currently available, a Github repository with full code, explanations, intuitions, and some results is available here. Being the sole author, depending on the feedback here, I may continue to write a paper, though my resources are extremely limited. I would very much appreciate any feedback on the work, code, ideas, etc., or for anyone to contact me with questions or next steps. Repository here. submitted by /u/LahmacunBear [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/23/2023
    The chipmaker Nvidia has far surpassed quarterly expectations, raking in $13.5bn in revenue – over $2bn more than the $11.2bn Wall Street analysts had predicted – amid skyrocketing demand for its computer chips that power AI systems.[1] As a person who keeps following AI Daily News, I bought some Nvidia stocks months ago ;) Microsoft announced it is partnering with Epic, one of the biggest names in electronic healthcare records. Both companies will work on generative AI technology for healthcare workers, particularly clinicians.[2] Arm, the chip design company owned by SoftBank, filed for an initial public offering on the Nasdaq exchange on Monday.[3] South Korean internet giant Naver unveiled its own generative artificial intelligence (AI) tool on Thursday, joining the frenzy around the new technology initiated by OpenAI’s ChatGPT chatbot.[4] Sources: [1] https://www.theguardian.com/business/2023/aug/23/chipmaker-nvidia-quarterly-report-135bn-revenue-1tn-valuation [2] https://themessenger.com/tech/microsoft-epic-ai-for-medicine [3] https://www.nytimes.com/2023/08/21/technology/chip-designer-arm-ipo-softbank.html [4] https://www.reuters.com/technology/south-koreas-naver-launches-generative-ai-services-2023-08-24/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Jigs
    In his book The World Beyond Your Head Matthew Crawford talks about jigs literally and metaphorically. A jig in carpentry is something to hold parts in place, such as aligning boards that need to be cut to the same length. Crawford uses the term more generally to describe labor-saving (or more importantly, thought-saving) techniques in […] Jigs first appeared on John D. Cook.  ( 5 min )
  • Open

    How to help high schoolers prepare for the rise of artificial intelligence
    A one-week summer program aims to foster a deeper understanding of machine-learning approaches in health among curious young minds.  ( 10 min )
    Supporting sustainability, digital health, and the future of work
    The MIT and Accenture Convergence Initiative for Industry and Technology selects three new research projects to support.  ( 9 min )
    AI helps robots manipulate objects with their whole bodies
    With a new technique, a robot can reason efficiently about moving objects using more than just its fingertips.  ( 10 min )
  • Open

    Introducing Code Llama, a state-of-the-art large language model for coding
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Exploring the Perceiver Model: General Perception with Iterative Attention
    submitted by /u/ABDULKADER90H [link] [comments]  ( 9 min )
  • Open

    Xbox PC Game Pass Comes to GeForce NOW, Along With 25 New Games
    As part of NVIDIA and Microsoft’s collaboration to bring more choice to gamers, new Microsoft Store integration has been added to GeForce NOW that lets gamers stream select titles from the Xbox PC Game Pass catalog on GeForce NOW, starting today. With the Microsoft Store integration, members will see a brand-new Xbox button on supported Read article >  ( 8 min )
  • Open

    Needing some help with choosing the action and observation space of a custom environment
    I am currently trying to implement a custom environment but ran into a problem, because I don't know how to implement the action and observation space to solve the following (simplified) problem: - I have a board that consists of a large 1-D array of size x - For each episode I randomly generate N pieces, all with different IDs, consisting of different sizes on a per piece base that are to be placed on the board, but not all pieces can fit on the board at the same time - The action space in step 0 has size N and by picking an action the piece with the ID corresponding to the chosen action will be placed on the board and the action is removed from the action space - The goal is to fill the board as much as possible ​ Let's have an example rundown of an episode: Let's say we have x=100…  ( 10 min )
    MARL: help to understand SuperSuit approach
    Hi everyone, I have successfully trained a simple multiagent game environment using Stable Baselines 3 + PettingZoo + SuperSuit. Surprisingly, all of the agents learn incredibly well using a single agent interface as stable baselines 3 is. Now, my question is: I don't really get the classification of this algorithm. Is it an example of "joint action learning" or "centralised training and decentralised execution"? I have been following this tutorial in an handcrafted problem of mine: https://towardsdatascience.com/multi-agent-deep-reinforcement-learning-in-15-lines-of-code-using-pettingzoo-e0b963c0820b Unfortunately, SuperSuit doesn't seem to provide a detailed explanation of its workflow. It seems like that observation and chosen actions are stacked together, so I'm tending to think that it's a joint action learning implementation. Thank you in advance! submitted by /u/IntelligentAd6407 [link] [comments]  ( 9 min )

  • Open

    [D] How do you think Open AI hosts all these fine tuned models? Are they just dynamically swapping out LoRAs at run time?
    I feel like there is no way they make a unique copy of the entire gpt 3.5 weight set every time fine tuning happens. Do you think they have some sorta database of LoRAs and then load the appropriate ones at run time to append to the core model when fine-tuned models are invoked? An example of what I'm talking about can be seen here https://github.com/cccntu/minlora submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    Help for my model [P]
    Hey, I am building a sportsbook for my local rugny tournament, and I am pretty lost, I tried some model and they always fails in some points, because sometimes there are too many bets in one side and the other side cannot pay them. So when I have to change the quotas I don't know in what percent change them and whith what frequency. I am pretty lost and I can't find any information if someone can help would be awsome. Thx submitted by /u/Mikro34 [link] [comments]  ( 9 min )
    [D] Looking for early devs for an open-source LLM testing framework
    Hi all, still looking some more early devs to help with an open-source LLM testing framework. The framework is here: https://github.com/kortex-labs/korrect In any case, please star and suggest changes/ features. submitted by /u/kanxx030 [link] [comments]  ( 9 min )
    [D] EMNLP 2023 soundness score distribution
    I created a poll to get a distribution. Please share this so that everyone can get a sense of the distribution of scores https://x.com/web3noob101/status/1694412757917986977?s=46&t=pon015qe4aKxshdEPPdKtg submitted by /u/Mysterious_Isopod374 [link] [comments]  ( 9 min )
    [D] Backend Engineer exploring switching to Machine Learning Engineer
    Hi Machine learning enthusiasts I would like to hear machine learning engineers opinion on whether it is worth investing in a machine learning education for an experienced Software Engineer? And how switching from backend engineering to machine learning would be evaluated by hiring managers and recruiters? My motives behind considering this possibility is watching Machine learning industry is exponentially growing. Machine learning today, has became the basis of many successful products categories and the basis for solving problems that would have impossible otherwise. On the other hand, I am concerned about is the investment cost, lack of interest in machine learning topics beyond pure programming (such as math and stats), and the unintentional career rebooting. Meaning, if I switched from backend engineering to machine learning I would be throwing the 11 years of experience out of the window. submitted by /u/software-surgeon [link] [comments]  ( 9 min )
    [D]About model serialization and metadata
    (Discussion)Hey could anyone help me out in this question. So when we serialize a model the objects are serialized then what about the data it has like weights and architecture and dataset related information and other parameters. And also any insights on what is meant by metadata and model metadata submitted by /u/akash123608 [link] [comments]  ( 9 min )
    [D] SeamlessM4T's Research Paper Discusses Purposely Modifying Translations To Make It Less "Toxic", Am I Understanding That Correctly? Am I The Only One Who Thinks This Is A MASSIVE Problem??
    Hello. I was reading the SeamlessM4t paper published at the following link and I noticed the following excerpt: "Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Compared to the state-of-the-art, we report up to 63% of reduction in added toxicity in our translation outputs." Source: https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf Am I understanding this correctly? They are basically saying they purposely put guard rails to intentionally change the translation if it believes the translation is too "toxic"? If I am understanding this correctly, this is a MASSIVE overreach by the devs. How do they define text that is "toxic"? What are they doing to the text to make it less toxic? How can I trust that the translation it gives me in general is accurate if they are admitting to manipulating it? ​ I'll give a very tangible example on how this is a massive problem. I am working on a fan project aimed at translating an entire Japanese light novel series to english even though I can't read Japanese. I'm currently 50% done with a single volume through the use of ChatGPT and significant manual edits. I've had censorship issues with GPT but because its a general purpose AI I can prompt it to not censor it pretty easily. How am I supposed to trust that it is translating the story correctly when they are outright telling me they are censoring things, and this isn't like ChatGPT where I can jailbreak it to translate it properly. ​ I can see situations arising where the AI translates something incorrectly due to this and can potentially offend people of some cultures if it is purposely modifying the intended meaning of a sentence to avoid "toxicity". ​ Please tell me I'm misunderstanding the terms here or there is something I'm missing. submitted by /u/NepNep_ [link] [comments]  ( 10 min )
    Localhost as API for Stable Diffusion Model? [D]
    I want to make a website which uses my trained stable diffusion model but i dont want to deploy it to replicate yet and run it locally for testing. is there any easy way to get the model working as a api? maybe someone also has a guide/tutorial for it? would appreciate any help! submitted by /u/Overall-Cry9838 [link] [comments]  ( 9 min )
    [D] Companies publishing research papers
    Hi Folks! Does anybody here know of companies in and around Chicago that invests in publishing ML/AI research/conference papers? Thanks! submitted by /u/karanbond007 [link] [comments]  ( 9 min )
    [D] Unique idea for handwriting synthesis
    i saw bunch of handwriting synthesis projects using generative ai to recreate handwriting but the issue with them is they require quite a lot of computational power to train, large amount of data and its not personalised (it cannot copy anyone handwriting, it just gives a general output). So i have a unique idea(i hope its not done before), 1. Use a segmentation model to extract each word from a page 2. Separate and identify each extracted word 3. Store the word, then when its time to recreate the handwriting take the stored word and paste it . For example- If i give a handwritten sample of " a quick brown fox jump over a lazy dog" Its stores - "a" "quick " "brown".. and every letter individually like "a" "b" from “brown”, c from “quick” etc Then when i want to write "a brown dog" It takes the stored words (if not word is found combine the alphabets] and paste them together to recreate the sentences in my handwriting (I hope can explained it properly) So i want to take opinion of someone on this (will it work or not) as i dont have much experience in ML i just did a few projects on computer vision submitted by /u/Soumya1704 [link] [comments]  ( 9 min )
    [N] Python code for GenAI, including the seminal NoGAN synthesizer for tabular data
    NoGAN code is a tabular data synthesizer running 1000x faster than GenAI methods based on neural networks, and consistently delivering better results regardless of the evaluation metric (including state-of-the-art new quality metrics capturing a lot more than traditional distances), both on categorical and numerical features, or a mix of both. For details, see technical paper #29, available here. https://preview.redd.it/fxxjycjplwjb1.png?width=754&format=png&auto=webp&s=3db34e981506e2b0a50ef76b32e1c20365945769 Get the code on GitHub. #genai #syntheticdata submitted by /u/MLRecipes [link] [comments]  ( 9 min )
    [P] Poker Agent Baseline
    Hi all, looking for a baseline / prior work to compare against for building a No Limit Texas Hold 'Em agent. Seems like Libratus, Pluribus, DeepStack, etc. are all closed source. Has anyone made an open-source Poker agent that achieves somewhat reasonable performance? submitted by /u/YodelingVeterinarian [link] [comments]  ( 9 min )
    [N] Blog: Strategies for effective AI/LLM cost management
    For those of you knee-deep in cloud infrastructure for AI/LLM projects, you know the cost complexities all too well. This guide from Yotascale delves into proven strategies that can help you navigate these challenges like a pro. Read the blog post here: https://yotascale.com/blog/the-enigma-of-ai-cloud-costs-strategies-for-effective-management/ submitted by /u/More_Knowledge2000 [link] [comments]  ( 9 min )
    [P] Out-of-the-box FP8 training (nanoGPT demo)
    The latest gen of AI chips can do FP8 compute, but making the most of this isn't straightforward - just naïvely inserting FP8 casts causes training to fail (e.g. grads underflow). To fix this I've been working on a method called unit scaling, which I demo in this notebook: github.com/graphcore-research/out-of-the-box-fp8-training.ipynb With a one-line code change (model = unit_scale(model)) FP8 training now matches the loss of FP32. It works by re-scaling operations in the fwd & bwd pass so that training starts with all tensors in the centre of the numerical range (see visualisations in notebook), with negligible overheads. Hopefully people find this useful in getting the most out of their FP8 hardware. submitted by /u/thecharlieblake [link] [comments]  ( 9 min )
    [P] Ideas for projects using Azure ML
    Heya! I'm studying for DP-100, Azure Data Scientist Assis. certification. All I have are study materials and guides. It's great (slightly overwhelming tho), but I learn better with practice than theory. Any ideas for projects using Azure Portal that could be a cool way to learn more on Data Science, ML, and obviously Azure? Appreciated! submitted by /u/Zealousideal-Car6009 [link] [comments]  ( 9 min )
    [D] Coral accelerator module
    Has anyone bought some coral stuff? For years I've wanted to buy some coral stuff from Google but every time I try, no seller has stock, it's my bad luck or it's discontinued, if not, does anyone know when there will be restock? What interests me mainly is an accelerator module, the microchip itself, does anyone know where I could get it? submitted by /u/sinnstral [link] [comments]  ( 9 min )
    [D] Question Answering on specific corpus
    Hi, I'm a machine learning practitioner but I've only mostly worked with classical ML models and I'm newly interested in larger NLP models for a specific task. I was wondering if it's possible to train a model that specifically does: Question answering On a specific document set Without having to supply the specific document to look for the answer for* OR With the context being much bigger than the question *by this I mean I've looked at stuff like Huggingface's Question Answering tutorials, but mostly the question is like 1 sentence and the context is also like a sentence or two. Basically let's say there's like a document that's a few hundred pages long detailing some rules of conduct, and I'd like to ask question about the rules and how to proceed in specific scenarios. I think I'm looking for extractive question answering, but I have some questions. I get that I'd need to do some ranking and then pass the most likely documents as context, but would that even work if the question is just a sentence and there's a whole corpus of multipage documents to look through? I'm pretty sure cosine similarity would be useless at that point, passage ranking might work but I haven't read up on how that works. I think my questions are: Is there a model that does question answering on a specific, big corpus of documents? What models should I look into? Are there any resources you'd recommend for reading into the topic? Thank you! submitted by /u/lifesthateasy [link] [comments]  ( 9 min )
    [P] LLM Apps Are Mostly Data Pipelines
    My colleague just wrote up an article on LLM-based apps and how to use data engineering tools to help build them faster that I found really insightful. It contains a complete implementation with scraping context data from a docs website chunking it, getting embeddings via the openAI API loading it into pinecone and finally a simple Q&A interface with streamlit on top of it Here's a quick summary: LangChain and LlamaIndex are great tools for quick exploration But aren't perfect for production-grade use I think we all know the "LangChain is pointless" debate, but there's a lot of real meat to it, and Pat describes a few of them (a lot of LangChains extractors are super basic, 2-3 liners without retries etc.) LLM applications are all about moving data, extracting and enriching data (creating embeddings!) are the most expensive ones of those steps A bunch of data engineering tools are out there that make these two steps much easier, versionable, robust, and reproducible. Meltano is one such tool and Pat implemented the above described pipeline with it FWIW: The GitHub project that comes with the post is super easy to run and super modular. I just tested it and was able to modify everything for my own application within 30 mins. submitted by /u/sbalnojan [link] [comments]  ( 9 min )
    OpenAI launches fine-tuning for GPT-3.5 Turbo [N]
    OpenAI just announced a new feature: fine-tuning for GPT-3.5 Turbo, the lightweight version of GPT-3.5. This means that users can now bring their own data and train the model to perform better on specific tasks and domains. If you want to stay on top of the latest trends and insights in AI and tech, look here first. https://preview.redd.it/8chj51jobtjb1.jpg?width=862&format=pjpg&auto=webp&s=7c710837179d922435ee714572109100d98196ec Why this matters: Fine-tuning opens up new possibilities for creating customized and reliable AI solutions. Users can improve the model’s accuracy, consistency, and style by feeding it relevant data and instructions. Fine-tuning can also reduce costs and latency. Users can shorten their text prompts by embedding the instructions into the model itself, whic…  ( 10 min )
    [D] What are your opinions on the ability of GANS versus Diffusion models in 2023?
    Curious on validity of both styles of training. There is Gigagan which had lower FID than diffusion models, however I also don't know if data was fabricated or not (which happens a lot in research). Did any of you actually get the chance to test the fully trained model and compare it to Stable Diffusion or Midjourney? There is of course diffusion models which are the only commercialized products which people are actually using. Do you think Diffusion models are the way forward and hope for something newer to come out if it does or do you think there will be a resurgence in GAN models again? submitted by /u/I_will_delete_myself [link] [comments]  ( 9 min )
    [R] Endorse me on arXiv pleaasee !!
    Anyone care to endorse me on arXiv ?? CS AI or ML i would thank you forever - go to this link : http://arxiv.org/auth/endorse.php - enter this code : HCNHBO submitted by /u/Wrong_Swimming_9158 [link] [comments]  ( 9 min )
  • Open

    💡AI Opportunity?
    Hey friends … I’m interested!! Where do my database users see the opportunity for AI in your day-to-day activities? submitted by /u/Early-Pudding8100 [link] [comments]  ( 9 min )
    thought id revisit poe after not going on the app for a while.. what is this..
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/22/2023
    IBM taps AI to translate COBOL code to Java.[1] ElevenLabs, the viral AI-powered platform for creating synthetic voices, today launched its platform out of beta with support for more than 30 languages.[2] Amazon AI scammers blew millions on Lake Como wedding and cars, FTC alleges.[3] Facebook parent company Meta on Tuesday released an AI model capable of translating and transcribing speech in dozens of languages, a potential building-block for tools enabling real-time communication across language divides.[4] Sources: [1] https://techcrunch.com/2023/08/22/ibm-taps-ai-to-translate-cobol-code-to-java/ [2] https://techcrunch.com/2023/08/22/elevenlabs-voice-generating-tools-launch-out-of-beta/ [3] https://www.cnbc.com/2023/08/22/amazon-ai-scammers-blew-millions-on-lake-como-wedding-cars-ftc-claims.html [4] https://www.reuters.com/technology/meta-releases-ai-model-translating-speech-between-dozens-languages-2023-08-22/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Handiest way to receive feedback on rowing training.
    submitted by /u/BronxLens [link] [comments]  ( 9 min )
  • Open

    About model serialization and metadata
    Hey could anyone help me out in this question. So when we serialize a model the objects are serialized then what about the data it has like weights and architecture and dataset related information and other parameters. And also any insights on what is meant by metadata and model metadata submitted by /u/akash123608 [link] [comments]  ( 9 min )
  • Open

    SMART launches research group to advance AI, automation, and the future of work
    Mens, Manus and Machina (M3S) will design technology, training programs, and institutions for successful human-machine collaboration.  ( 9 min )
  • Open

    Persistent Systems shapes the future of software engineering with Amazon CodeWhisperer
    Persistent Systems, a global digital engineering provider, has run several pilots and formal studies with Amazon CodeWhisperer that point to shifts in software engineering, generative AI-led modernization, responsible innovation, and more. This post highlights four themes emerging from Persistent’s Amazon CodeWhisperer experiments that could change software engineering as we know it.  ( 8 min )
  • Open

    Simple Gridworld Gymnasium Environment
    SimpleGrid is a basic and simple gridworld environment compatible with Farama-Foundation's Gymnasium. https://i.redd.it/6dfro8o11vjb1.gif It is easy to use and customise and it is intended to offer an environment for quickly testing and prototyping different RL algorithms. Check it out at: https://github.com/damat-le/gym-simplegrid submitted by /u/damat-le [link] [comments]  ( 9 min )
    Help with bounded Actor-Critic Algorithm - Hyper parameters
    I'm working on solving an optimisation problem using RL and currently trying out a Bounded Actor-Critic agent. I tuned the hyperparameters of my agent using Bayesian optimisation running each iteration of the optimiser for 1000 episodes. The agent is performing well using the tuned hyperparameter when run for 1000 episodes, exceeding the performance of my previous Q-learning agent. However, when run for longer iterations it finds the optimal policy but later deviates and converges to a suboptimal policy leading to really poor overall performance. I suspect the issue might be the high learning rate of the actor and the low learning rate of the critic. I tried using a basic decay schedule for the actor's learning rate and it seems to improve the stability. However, the performance is lower than the Q-learning agent. Why is this happening iyo? Any ideas on how to fix it is appreciated. Picture of rewards for reference: ​ Reward v Iteration submitted by /u/WengerIn420 [link] [comments]  ( 9 min )
    Best way/data structure to store a MDP?
    In your experience, what is the best data structure to store a Markov Decision Process, it could be built-in like list, tuple, set, dict, or module-related np.array, or others in CS field like heap, queue, etc.? ​ https://preview.redd.it/08dlubqacrjb1.png?width=969&format=png&auto=webp&s=b9ed64a935b12cae9954021dc435d81a2569596e submitted by /u/Neither_Canary_7726 [link] [comments]  ( 9 min )

  • Open

    AI conferences
    just 2 quick questions: what is a good site to know about and keep track of top AI conferences? Is it true that aside from mainstream AI conferences, we can also send AI/ ML papers to field specific conferences (like biotech, natural science etc)? - and again how to find these field specific conferences? ​ Cheers! submitted by /u/Icy-Bid-5585 [link] [comments]  ( 9 min )
    Try out my AI generated crossword puzzles
    I would love feedback. They are FOR SURE not perfect. I wonder if anybody is good enough at crosswords to overcome the rough edges. https://nickvinden.com/crossword/ submitted by /u/SameerMohair [link] [comments]  ( 9 min )
    Political prompts banned on AI image generators
    All I want to do is make a pic of Donald Trump dressed in a Japanese Shogun’s outfit to send to my economist friends but every platform I’ve tried has a stroke because they all think I’m trying to create some disinformation campaign. I don’t care if it’s not photorealistic, honestly it looking like a traditional 18th century Japanese painting would be funnier. Are we never going to be able to use these tools to create anything even close to political satire? submitted by /u/Inception_Bwah [link] [comments]  ( 9 min )
    AI for E-Mail
    Is there a way to use Bard or ChatGPT to have auto response to Outlook emails and then send it to an "important" folder for me to check later. Or if customer is requesting for a quote, then send it to a "quotes" folder. Like, just a standard reply like "hey thanks for your message, I'll get back to you in 24hrs". submitted by /u/lasagnaHardG [link] [comments]  ( 9 min )
    Can AI help to make better travel Plans?
    submitted by /u/biosbetoub [link] [comments]  ( 9 min )
    music tool
    can someone pls point me in the direction of a tool that you can plug multiple mp3s into and it generates mp3s that are hybrids of the them all? TIA submitted by /u/SensibleInterlocutor [link] [comments]  ( 9 min )
    AI’s Impact on Household Robots and its Efficiency in Reducing Planning Duration by 50%
    Not too long ago, the concept of having robots in our households existed only in works of science fiction. However, as time has progressed, household robots have become a tangible reality that is significantly impacting the way we handle our everyday responsibilities. Moreover, the integration of Artificial Intelligence (AI) has enabled these robots to become increasingly intelligent and effective. Comprehending Household Robots: Household robots are a type of robotic device made to aid us with different activities in our houses. They are available in different forms and sizes, each customized to specific purposes. Cleaning robots efficiently sweep and mop floors, cooking assistants flawlessly prepare meals, security robots supervise and protect our homes, and companion robots provide c…  ( 12 min )
    Preparing for AI in a factory setting
    I'm interested in applying AI techniques in my factory. But the facility is far behind the times. We have very little digital data. We only have one PLC system, and a handful of other sensors in the facility. So I don't think they are useful yet. I'm looking to upgrade the factory by buying more sensors where appropriate, and implementing statistical control. I'll start slow focusing on areas we need to improve rather than start sticking sensors to things without purpose. Eventually I hope to have enough data that we can apply AI analysis techniques. What should I do now to make it easy to apply those techniques in the future? submitted by /u/Aggressive_Ad_507 [link] [comments]  ( 9 min )
    WoooW! YouTube takes over the lead for the AI industry age!
    it was only a matter of time: the fbig labels can't repeat the mistakes of the mp3 file-sharing era - yet the AI development threatens the industry. now YouTube has set up a set of rules and has one of its strongest partners: Universal Music. Either you join the incubator - or you leave the market. What do you think? https://kinews24.de/music-industry-ai-how-youtube-and-universal-redefines-the-music-industry ​ submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/21/2023
    Computer scientist Stephen Thaler’s bid to secure a copyright registration for an artwork created by artificial intelligence has been shot down for at least the third time by a Washington, D.C. court.[1] Scientists from the Korea Advanced Institute of Science & Technology (KAIST) have developed a humanoid robot capable of flying an aircraft without majorly adjusting the cockpit.[2] Zoom has made significant advancements in its artificial intelligence (AI) technology as it aims to empower customers to work smarter in a hybrid work environment.[3] Eye scans powered by AI could detect Parkinson’s disease in people before they have symptoms, a study has suggested.[4] Sources: [1] https://news.artnet.com/art-world/court-shot-down-ai-art-copyright-again-2352452 [2] https://www.giantfreakinrobot.com/sci/robots-flying-planes.html [3] https://www.pymnts.com/artificial-intelligence-2/2023/zoom-taps-ai-to-empower-customers-in-safe-hybrid-work-environment/ [4] https://www.rte.ie/news/2023/0821/1400924-ai-parkinsons/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How Will We Know When AI is Conscious?
    Historical Perspective: The program "Eliza" was mentioned as one of the earliest attempts to simulate conversation with a machine. Its design was basic, yet people attributed human-like characteristics to it. This leads to a fundamental question: Will machines ever appear conscious to us? And if so, is appearance of consciousness sufficient? Capabilities of Modern AI: Systems like ChatGPT can generate clever and creative outputs, but they fundamentally operate on pattern recognition and prediction rather than true understanding. The Implications of AI Evolution: If the costs and resources for AIs decrease, we could see a proliferation of AI systems with varying goals. These AI systems can be used for manipulative or malicious purposes, like spreading misinformation, which can have real-world consequences. The Ethics of Conscious Machines: There is a distinction between machines appearing conscious and actually being conscious. If machines are truly conscious, they come with ethical obligations. Machines that only appear conscious could still manipulate human emotions without any genuine understanding or reciprocation. The Nature of Consciousness: The lesson discussed the difference between sentience, sapience, and consciousness. There's still much we don't understand about consciousness, making it challenging to determine if a machine can truly possess it. Safety Concerns: Aligning AI's goals with human values is critical. Misaligned AI could take actions detrimental to humanity. We need to be cautious about releasing powerful AI systems without proper safeguards. The Future: If we ever confirm that machines can be truly conscious, it will open a new chapter in the history of life and evolution. This could lead to a new era where we become builders of minds. submitted by /u/nicdunz [link] [comments]  ( 9 min )
    1x’s robot is gonna step on someones pets foot on accident and then 1x is gonna get sued even tho we do it all the time
    ^ submitted by /u/nicdunz [link] [comments]  ( 9 min )
    From cattle to coding: The inspiring journey of a Peruvian engineer helping Google translate Aymara to English using AI
    submitted by /u/egusa [link] [comments]  ( 9 min )
  • Open

    Increase in Loss and Stagnant Reward in DQN Training using Stable Baselines3
    I am attempting to train an agent using StableBaselines3 on a custom environment. I am using the DQN algorithm with default parameters. However, I have noticed that after a certain point, my loss values start to consistently increase, while the reward remains relatively unchanged or it just oscillates. I have made various attempts to adjust the parameters on my own, but I have not been successful in resolving this issue. I would greatly appreciate it if someone could provide guidance on what might be causing this behavior and offer suggestions on how to address this problem. submitted by /u/uonliaquat [link] [comments]  ( 9 min )
    [P] PettingZoo 1.24.0 has been released (including Stable-Baselines3 tutorials)
    PettingZoo 1.24.0 is now live! This release includes Python 3.11 support, updated Chess and Hanabi environment versions, and many bugfixes, documentation updates and testing expansions. We are also very excited to announce 3 tutorials using Stable-Baselines3, and a full training script using CleanRL with TensorBoard and WandB. Tweet: https://twitter.com/FaramaFound/status/1694095374569394447 Release notes: https://github.com/Farama-Foundation/PettingZoo/releases/tag/1.24.0 For more information about the Farama Foundation, see https://farama.org/, or join our discord server: https://discord.gg/nhvKkYa6qX submitted by /u/elliottower [link] [comments]  ( 9 min )
    Summary for the Sutton and Barto book
    Is there a good summary online out there for the Sutton and Barto book? submitted by /u/immer_hungrig [link] [comments]  ( 9 min )
    Continue training after slight modification to the environment
    I trained for a few iterations, tested my model, and noticed an unwanted behaviour. This unwanted behaviour can be fixed by a slight adjustment in the reward scheme in the environment. I imagine this is very common - when you guys are in such situations, do you retrain from scratch or continue training the model from the last checkpoint. Is this dependant in any way on which Policy algorithm is used? Or perhaps on the parameters set that could influence this e.g. gamma? Thanks! submitted by /u/WagnerianJLC [link] [comments]  ( 9 min )
  • Open

    [R] Retriever Model on Wikipedia
    I am new to RAG. How do you guys build a retrieval using wiki data? What embedding to use? How to index? I want to use it for open ended QA submitted by /u/rodeowrong [link] [comments]  ( 9 min )
    [D] What's next for CV after bounding boxes?
    Real-world usage of CV has been dominated by object detection (a la YOLO). Labeled bounding boxes are the cornerstone of the biggest CV applications in manufacturing and surveillance. But what's next? We (a team of researchers at Stanford) just put out a short blogpost on Video Analysis Beyond Bounding Boxes. We would love any feedback on this vision for what the future of CV could look like! submitted by /u/calebwin [link] [comments]  ( 9 min )
    [D] Getting formatted, variable output from LLM
    I have been trying to extract information from a LLM in a formatted way. I have tried jsonformer and clownfish, but can't seem to get it to work. In particular, I am trying to extract what Vicuna 7B knows about cheese. If I ask it what kinds of cheeses it knows, it will list them. If I ask about a particular kind of cheese, it will tell me about that kind. USER: List different kinds of cheese ASSISTANT: There are many different kinds of cheese, including: Cheddar Mozzarella Parmesan Swiss Gouda Emmental Gruyère Camembert Brie Blue cheese Goat cheese Feta Ricotta Roquefort Pepper Jack Fontina Provolone Pecorino Mascarpone Yarg USER: What kind of cheese is yarg? ASSISTANT: Yarg is a type of soft cheese that is traditi…  ( 10 min )
    [R] Open Sourcing a Data Science Analytics Platform
    Question to the ML builders: Would you like to use a user-friendly data science analytics platform if we open-source it? Lyzr is to data analysts and business users what Streamlit is to data scientists and ML engineers. We're on the verge of launching an open-source version of our new insights platform, www.lyzr.ai, explicitly crafted with the analyst community in mind, and we'd be honored if you could test it and share your invaluable feedback. It may currently seem like a mere GPT wrapper, but trust us, countless hours and dedication have gone into making this more than just that. Why did we create it? There is just 1 data scientist for every 100 data analysts (as per GCP data analytics head). We envision a world where data analysts and business users have the tools to dabble more in to data science. Our platform also aims to simplify the 0-75th percentile of descriptive statistics for data scientists, allowing them to concentrate on building more complicated data science models. The cherry on top? We're gearing towards an open-source launch. We believe in the power of collective genius and want everyone to benefit from what we've built and further enhance it collaboratively.Please let me know if you are interested in giving it a spin. Will DM the link. And let us know what you think! What features resonate with you? What's missing? Would you use it if open-sourced? Your feedback will not only be appreciated, but it'll also be instrumental in shaping the future of this platform. Thank you and looking forward to your insights! submitted by /u/sivasurendira [link] [comments]  ( 9 min )
    [D] Fine-tuning keras_ocr
    Hello everyone. I'm trying to fine-tune an existing OCR model called keras_ocr. In order to do so, I followed the instructions provided in the model documentation, which can be found at this link: https://kerasocr.readthedocs.io/en/latest/examples/fine_tuning_recognizer.html. Unfortunately, I encountered an error when I attempted to fit the model using the provided code. Could you please provide me with specific details about the error message I received? and how I can solve it. Epoch 1/1000 --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () 4 tf.keras.callbacks.CSVLogger('recognizer_borndigital.csv') 5 ] ----> 6 recognizer.training_model.fit( 7 training_gen, 8 steps_…  ( 9 min )
    [P] Multivariate time-series analysis and annotation tool
    I was working on a time-series classification problem for which we had to label the data ourselves. To visualize/annotate and manipulate the data, I created a tool built on top of Matplotlib and Pandas using PySide6. I thought it might be helpful for any people that are working on time-series data. https://i.redd.it/hw65zxdrfpjb1.gif The only requirement for the data is the presence of a "DateTime" column - the tool supports loading .xlsx, .csv and pickled-dataframe files. The source code is available on GitHub, and the app can also be installed from PyPi (pip install MVTS-Analyzer - tested on windows/ubuntu with > Python3.8). Any feedback is of course welcome. submitted by /u/Woutaha [link] [comments]  ( 9 min )
    [R] Graph of Thoughts: Solving Elaborate Problems with Large Language Models - ETH Zürich 2023
    Paper: https://arxiv.org/abs/2308.09687 Github: https://github.com/spcl/graph-of-thoughts Abstract: We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks. https://preview.redd.it/jy60udt8cpjb1.jpg?width=1523&format=pjpg&auto=webp&s=d91e1a1784f236d56cacae666ff2f88f3b810556 https://preview.redd.it/d1d9t5u8cpjb1.jpg?width=925&format=pjpg&auto=webp&s=5eb7f59a6d292687ca41974c4c4448e233969748 https://preview.redd.it/7ywrlht8cpjb1.jpg?width=932&format=pjpg&auto=webp&s=44bb76ed8d40d8c9cff6d0fc575ce58635915110 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Are there any books that would help with implementing the ML/Deep Leaning algorithms?
    As the title is saying, I have experience with ML enough to be able to implement things myself (as a way to make my CV better, and for my academic future). I want to start implementing papers, but before doing that I need to know where to even start? Are there any books that can help me with that? Implementing the algorithms from scratch so I can build on that? submitted by /u/theonewhoask11 [link] [comments]  ( 9 min )
    [R] QuIP: 2-Bit Quantization of Large Language Models With Guarantees - Cornell University 2023
    Paper: https://arxiv.org/abs/2307.13304 Github: https://github.com/jerry-chee/QuIP Abstract: This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight. https://preview.redd.it/uu034fa6apjb1.jpg?width=927&format=pjpg&auto=webp&s=c22148c1ba6d57e9690b9c46aa3d433bf0023b47 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Face Recognition: What's The State Of The Art Technology Out There?
    ​ Hi. I want to know how can I make a python script/app which will be able to detect and then recognize faces at a certain distance (let's say 5-10feet) Real-Time from CCTV camera. It should also be able to recognize Unknown faces correctly. One major problem I am facing is that unknown faces are being labeled as known faces even though their face looks nothing like that. Also, it should be able to recognize at least 500-1000 different faces correctly. ​ What are some good git repos/ latest technology that I should look into? Also, I want to know how does Hikvision implement face recognition in their newer cameras? What model do they use to recognize faces? ​ ​ ​ https://github.com/ageitgey/face_recognition : I have tried this out. It's easy to code and accurately recognizes faces. The problem is it can't even detect faces 1 feet away from the camera. ​ https://github.com/timesler/facenet-pytorch (FaceNet & MTCNN) : This can detect and recognize faces at a distance, but the problem is it can't recognize unknown faces correctly. I mean for unknown faces it always tries to label it as one of the faces from the model/ database encodings. ​ https://github.com/serengil/deepface : I have tried VGG, ArcFace, Facenet512. The latter two gave me good results. But, the problem is I couldn't figure out how to change the detection from every 5 seconds to real-time. Also, I couldn't change the camera source. (If anyone can help me with these please do). Also, it had fps drops frequently. ​ https://github.com/deepinsight/insightface: Couldn't test this yet. But in the demo YT video it shows the model incorrectly detecting a random object as a face. If someone knows how well this performs please let me know. submitted by /u/ProfessionalNovel984 [link] [comments]  ( 10 min )
    [D] what are the currently recommended approaches to detecting slips/falls in surveillance videos?
    Im familiar with the VFP290K approach but in the new world of transformers are there better approaches? submitted by /u/bluzkluz [link] [comments]  ( 9 min )
    [D] SOTA in one-shot face recognition
    What is the current SOTA in one-shot face recognition? Looking for something like FaceID but without the IR illuminator/camera data. I see that GhostFace and ArcFace are the SOTA right now for face recognition but it's for generic face recognition and not one-shot submitted by /u/jayshenoyu [link] [comments]  ( 9 min )
    [D] RLHF vs RLAIF for language model alignment
    Hey everyone, As most of you here know, RLHF became famous with the release of ChatGPT. While LLMs were capable as general-purpose agents before the release of ChatGPT, RLHF was the crucial factor that differentiates it from previous models. With the increasing popularity of AI assistants, we've seen recently how they can be manipulated to produce harmful and unethical outputs. Anthropic devised a new method for LLM alignment called Constitutional AI, which is closely tied to their concept of Reinforcement Learning from AI Feedback. Instead of using human feedback to train the LLM, RLAIF uses AI feedback. I wrote this article on RLHF vs RLAIF for language model alignment that I thought you might enjoy. It's not super technical and seeks to serve as an overview of the inspiration for creating RLAIF, so I hope it will be helpful even if you don't work in NLP. Here are some highlights: RLAIF constitutes a Pareto improvement over RLHF, simultaneously improving helpfulness and harmlessness RLAIF (in this formulation) incorporates a constitution of principles by which it should abide RLAIF is much more scalable than RLHF as a means of supervising alignment ​ https://preview.redd.it/d1i6x8kiqojb1.png?width=960&format=png&auto=webp&s=93c60080ae146dda07990ad9dc8b94e3bbec2d0e submitted by /u/SleekEagle [link] [comments]  ( 9 min )
    [R] Releasing IDEFICS, the first open state-of-the-art visual language model at the 80B scale!
    Hugging Face is releasing IDEFICS, an 80B open-access visual language model. IDEFICS is a reproduction of Flamingo, a multimodal model developed by DeepMind, which has not been released publicly. The model is built solely on publicly available data and models. It is the first visual language model of this scale available in open-access! IDEFICS was partly trained on OBELICS, a new open large-scale dataset of interleaved image-text documents comprising 141M web pages extracted from Common Crawl, 353M associated images, and 115B text tokens. Training the model was a bumpy trip, and this knowledge sharing memo compiles some of the learnings. Ressources: Announcement: https://huggingface.co/blog/idefics Demo: https://huggingface.co/spaces/HuggingFaceM4/idefics_playground Models: https://huggingface.co/HuggingFaceM4/idefics-80b-instruct OBELICS dataset: https://huggingface.co/datasets/HuggingFaceM4/OBELICS OBELICS paper: https://arxiv.org/abs/2306.16527 Lessons learned: https://github.com/huggingface/m4-logs/blob/master/memos/README.md submitted by /u/VictorSanh [link] [comments]  ( 9 min )
    [P] VisionScript: An abstract programming language for computer vision
    Hello! I'm James and I am working on VisionScript, an abstract programming language for computer vision. With VisionScript, I want to empower people -- including everyone without any prior programming experience -- to build cool apps with vision. This weekend, I recorded a demo for VisionScript, in which I made apps that count how many cats are in an image and hides people in a video. Each app was < 10 lines of code. https://vimeo.com/856043804 VisionScript is built for the 10 year old inside of me who would have loved more visual programming languages with which to play. I want to show people the potential of programming and how you can make what you want with computers, whether it be a game that counts cats or an app that monitors how many birds flew past a tree. Those "wow" moments should come as soon as possible in one's learning experience. VisionScript is in active development. I started work on this project in July. Follow along as I add more features and explore more possibilities in making computer vision intuitive. submitted by /u/zerojames_ [link] [comments]  ( 9 min )
    [D] NeurIPS Discussion phase has ended. How was the overall experience for you ?
    I am not sure if "Discussion" was always part of the Neurips pipeline but I felt like it was a good addition (in principle). On one hand it alows the authors to present their case with more clarity. On the other hand, it does increase the overhead for the reviewers which are now required to work even harder (and for free). For me, it was a mixed bag. Most of the reviewers did engage and the discussion was indeed fruitful. However, some didn't bother to follow up on the responses to their concerns and questions. Unfortunately, also quite expected. I would definitely like to see this in the next Neurips but maybe with some tweaks and modifications keeping in mind the (unpaid) reviewers. submitted by /u/PaganPasta [link] [comments]  ( 9 min )
    [D] EMNLP 2023: Rebuttal
    Reviews for EMNLP 2023 will be released soon. Good luck to everyone and we could use this post for discussion about the reviews! submitted by /u/Alliswell2257 [link] [comments]  ( 9 min )
    [D] Has anyone tried taking an AI TTS model and shoving the output into RVC?
    I'm working on a fun side project of AI TTS in python (that also features chatGPT). I was initially using Elevenlabs and the quality of the voices was incredible. But I quickly realized that it was a very expensive API. This has led me down exploring open source alternatives that I can run locally and self host to save money on API costs (or I guess find a cheaper API but I think self hosting long term will be way cheaper.) The general consensus seems like the only thing comparable to Elevenlabs is a really well tuned tortoiseTTS model or feeding the output of an AI TTS model into RVC to make the speech sound cleaner and less robotic. Here's the things I've found in my research: tortoiseTTS+ RVC v2 - This video seemed pretty promising but I'm a little worried the response times will be…  ( 10 min )
    [D] High-frequency time-series signal classification and forecasting SOTA
    I'm working with a high-frequency time-series signal (up to 8 kHz). Most of the SOTA I found in Papers With Code and this review work for low frequency dataset. I want to classify and forecast the raw signal if possible. Are there any methods that work? Or should I go with feature extraction and use the feature to classify or forecast? Thanks for the advice. submitted by /u/puddit [link] [comments]  ( 9 min )
    [D] Small utilities you use for python experimentation?
    Hello, I'm doing some experimentation around deep learning, and I've written a small helper tool, run(fn, description). When I run this command, it will just snapshot the fn code into a python file and prepend the description and output in a comment. Also appends to a log file with [date, description, py filename]. This works well when I use the VSCode's python mode. I feel like this is pretty simple and most likely there are better utilities like this. What tools or utilities or do you use? Some issues I found: my data loader was outside of fn and didn't get captured i forgot to export the opt_state so I couldn't resume learning after I terminated the run submitted by /u/windoze [link] [comments]  ( 9 min )
    [D] WACV 2024 Round-1 Paper Notification
    WA, B, B, with one B saying willing to increase the score if an additional experiment is provided and the other B saying the approach is not that novel....do I have a chance? How did you all do? submitted by /u/Individual-Bend-9690 [link] [comments]  ( 9 min )
  • Open

    Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler
    In this post, we walk you through importing data from, and exporting data to, an S3 access point in SageMaker Data Wrangler.  ( 6 min )
    Machine learning with decentralized training data using federated learning on Amazon SageMaker
    In this post, we discuss how to implement federated learning on Amazon SageMaker to run ML with decentralized training data.  ( 13 min )
  • Open

    Language to rewards for robotic skill synthesis
    Posted by Wenhao Yu and Fei Xia, Research Scientists, Google Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences. The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal. Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents. While these methods impart new modes of compositional generalization, they focus on using lang…  ( 92 min )
    Language to rewards for robotic skill synthesis
    Posted by Wenhao Yu and Fei Xia, Research Scientists, Google Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences. The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal. Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents. While these methods impart new modes of compositional generalization, they focus on using lang…  ( 92 min )
  • Open

    Machine-learning system based on light could yield more powerful, efficient large language models
    MIT system demonstrates greater than 100-fold improvement in energy efficiency and a 25-fold improvement in compute density compared with current systems.  ( 9 min )
  • Open

    Meta Releases SeamlessM4T, a Multimodal AI Model for Speech and Text Translation
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Watching Neural Networks Learn
    submitted by /u/keghn [link] [comments]  ( 9 min )
    Tensorflow learning process local minimum
    I am teaching a mrc_lstm neural network on some time series data. I am using Tensorflow with Keras. When I change the sampling from 30 minutes to 10 minutes (in my data) I experience something strange. The learning process stucks on local minimum. 3073/3073 [==============================] - 103s 31ms/step - loss: 0.7989 - accuracy: 0.5153 - val_loss: 0.6954 - val_accuracy: 0.5111 - lr: 2.0000e-04 Epoch 2/2000 3073/3073 [==============================] - 100s 31ms/step - loss: 0.6932 - accuracy: 0.5156 - val_loss: 0.6932 - val_accuracy: 0.5111 - lr: 2.0000e-04 Epoch 3/2000 3073/3073 [==============================] - 99s 31ms/step - loss: 0.6927 - accuracy: 0.5156 - val_loss: 0.6929 - val_accuracy: 0.5111 - lr: 2.0000e-04 Epoch 4/2000 3073/3073 [==============================] - 99s 31ms/step - loss: 0.6927 - accuracy: 0.5156 - val_loss: 0.6930 - val_accuracy: 0.5111 - lr: 2.0000e-04 Epoch 5/2000 3073/3073 [==============================] - 99s 31ms/step - loss: 0.6927 - accuracy: 0.5156 - val_loss: 0.6929 - val_accuracy: 0.5111 - lr: 2.0000e-04 BUT! This only happens sometimes. When I restart the larning process it sometimes escapes the local minimum. What could be the problem here? I can only think about the problem with weight initialization. If I am lucky enough I find good weights and if not I am stuck. This is after the restart: 3073/3073 [==============================] - 102s 31ms/step - loss: 0.7905 - accuracy: 0.5201 - val_loss: 0.6966 - val_accuracy: 0.5557 - lr: 2.0000e-04 Epoch 2/2000 3073/3073 [==============================] - 100s 31ms/step - loss: 0.6706 - accuracy: 0.5930 - val_loss: 0.6637 - val_accuracy: 0.6289 - lr: 2.0000e-04 Epoch 3/2000 3073/3073 [==============================] - 99s 31ms/step - loss: 0.6515 - accuracy: 0.6234 - val_loss: 0.6507 - val_accuracy: 0.6607 - lr: 2.0000e-04 The other thing that I am thinking of is too much of a regularization. But tuning it did not give me immediate results. submitted by /u/Acrobatic_Ad6507 [link] [comments]  ( 10 min )
    Latent Space: Visualizing the complex mind of neural nets
    submitted by /u/keghn [link] [comments]  ( 9 min )
  • Open

    DSC Weekly 22 August 2023
    Announcements Top Stories In-Depth The post DSC Weekly 22 August 2023 appeared first on Data Science Central.  ( 20 min )
    How organizations can prepare for rogue AI
    By Ari Kamlani, Senior AI Solutions Architect and Principal Data Scientist at Beyond Limits Rogue AI, or an autonomous artificial intelligence system that commits potentially dangerous acts, may take many forms and can bring with it varying levels of severity, threats, or harm.  Intelligent systems, while incredibly useful and full of great potential, can still… Read More »How organizations can prepare for rogue AI The post How organizations can prepare for rogue AI appeared first on Data Science Central.  ( 24 min )
    Top 4 generative AI benefits for business
    In the midst of the Fourth Industrial Revolution, generative AI emerges as a beacon of transformative potential. While AI’s capabilities in automation, recommendation, and prediction have been widely acknowledged, its generative functions have opened new horizons for businesses globally. This article seeks to shed light on the benefits of generative AI, elucidating how they’re altering… Read More »Top 4 generative AI benefits for business The post Top 4 generative AI benefits for business appeared first on Data Science Central.  ( 20 min )
    The use of Big Data Analytics for better growth and innovation
    Innovations in technology are changing the rules when it considers the use of big data and analytics for better growth. Advanced software systems are highly decreasing analytics time, hence offering companies the potential for making quick decisions that will help in boosting revenue, mitigating costs and stimulating growth. This provides a competitive advantage to the organizations… Read More »The use of Big Data Analytics for better growth and innovation The post The use of Big Data Analytics for better growth and innovation appeared first on Data Science Central.  ( 21 min )
    Modern data quality management
    Modern Data Quality refers to the process of ensuring that data is accurate, reliable, consistent, and up-to-date in today’s data-driven environment. It involves implementing advanced technologies and methodologies to maintain high-quality data that meets the needs of various data-driven applications and analytics. Importance of Modern Data Quality: Innovation: Modern data quality drives innovation by providing… Read More »Modern data quality management The post Modern data quality management appeared first on Data Science Central.  ( 18 min )
    The relationship between Big Data and AI
    Big data and artificial intelligence are able to collaborate to help organizations reap a variety of benefits. Since AI requires large amounts of data in order to learn and make decisions, it is able to utilize big data as a source of raw material. While big data can store data from various sources, AI can… Read More »The relationship between Big Data and AI The post The relationship between Big Data and AI appeared first on Data Science Central.  ( 21 min )
  • Open

    Coming This Fall: NVIDIA DLSS 3.5 for Chaos Vantage, D5 Render, Omniverse and Popular Game Titles
    On the eve of Gamescom, NVIDIA announced NVIDIA DLSS 3.5 featuring Ray Reconstruction — a new neural rendering AI model that creates more beautiful and realistic ray-traced visuals than traditional rendering methods — for real-time 3D creative apps and games.  ( 8 min )
    NVIDIA Debuts AI-Enhanced Real-Time Ray Tracing for Games and Apps With New DLSS 3.5
    The latest advancements in AI for gaming are in the spotlight today at Gamescom, the world’s largest gaming conference, as NVIDIA introduced a host of technologies, starting with DLSS 3.5, the next step forward of its breakthrough AI neural rendering technology. DLSS 3.5, NVIDIA’s latest innovation in AI-powered graphics is an image quality upgrade incorporated Read article >  ( 6 min )

  • Open

    Help defining environment with complex action space
    As said on the title, I'm working on a personal MARL project with a high-dimensional and continuous action space. The environment is designed to give positive rewards to actions between some moving limits of the action range, and negative rewards to the actions outside of those limits. For example: Global action range: (0, 1000) Desired action range for first 100k steps: (0, 10) Desired action range for 100-200k steps: (30, 40) ... Therefore, the main challenge of the environment is that actions with positive rewards on certain stage of the environment would return negative rewards on the following stages. How should I define the actions of the agent? I've tried the following methods without success: Simply scale actions between 0 and 1000 and hope that agents learn the moving distribution of rewards Transform actions to percent variations and scale actions over a non-observed moving average (I tried adding the moving average to the observations but the results stayed the same) Observations do consider a dimension that serve to differentiate when a distributional shift happens Also, I've tried using SAC and DDPPG to model agents Feel free to share any comments or suggestions. Thanks! ​ submitted by /u/stinoco [link] [comments]  ( 9 min )
    "Diversifying AI: Towards Creative Chess with AlphaZero", Zahavy et al 2023 {DM} (diversity search by conditioning on an ID variable)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    "Trainable Transformer in Transformer (TinT)", Panigrahi et al 2023 (architecturally supporting internal meta-learning / fast-weights)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    How do you know if a problem is well suited for reinforcement learning?
    Is there a good way to think about how to determine when to use RL vs. other machine learning methods like deep neural nets or supervised learning? Specifically, when is RL not a good solution to a problem? For example, I am creating a project where I have data from a wearable device (Heart rate data, calories burnt, sleep data, etc.) and discrete mood measurements from 1 to 5 that occur every 15 minutes. I want to use the wearables data to try and predict the mood values. I did research in applied RL this Summer so I was thinking about using RL for this project because it is interesting to me and I have experience with it but I am unsure if it would be a god fit. I was thinking I would use some kind of policy gradient method. The wearables data could be set up as states where each state could be something like: s_(t) = { heart rate at this timestep, calories burnt up to this time in the day, hours of sleep last night, body temperature at this timestep, etc. } and then the reward could be the negative absolute difference between the actual mood value at that timestep and the mood that the agent selects as its action or something like that. I don't really think RL is a good fit here but I am curious what others think and I'm wondering if someone could explain why it isn't or why it could be possible. submitted by /u/lifelifebalance [link] [comments]  ( 9 min )
  • Open

    Can the chi squared test detect fake primes?
    This morning I wrote about Dan Piponi’s fake prime function. This evening I thought about it again and wondered whether the chi-squared test could tell the difference between the distribution of digits in real primes and fake primes. When data fall into a number of buckets, with a moderate number of items expected to fall […] Can the chi squared test detect fake primes? first appeared on John D. Cook.  ( 5 min )
    Mastodon account
    I have an account on Mastodon: johndcook@mathstodon.xyz. Note that’s @math… and not @mast… One advantage to Mastodon is that you can browse content there without logging, while Twitter is becoming more of a walled garden. You can browse my account, for example, by going to the URL https://mathstodon.xyz/@johndcook There’s hardly any content there at this […] Mastodon account first appeared on John D. Cook.  ( 5 min )
    Fake primes
    Someone asked on Math Overflow about the distribution of digits in primes. It seems 0 is the least common digit and 1 the most common digit. Dan Piponi replies “this is probably just a combination of general properties of sets of numbers with a density similar to the primes and the fact that primes end […] Fake primes first appeared on John D. Cook.  ( 5 min )
  • Open

    [D] What's the best alternative for Vertex Al for the moment in 2023
    Hi, Can anyone suggest a good platform to deploy ml models like Vertex AI? I can't use Vertex AI because I have a lot models, and I can't seem to run them on a shared resource pool with 2 gpus because there is a bug in Google infrastructure which I signaled and they responded. And what really didn't like is the limit of 60 seconds per call, I am deploying embeddings models and I want to embed a large text chunks, and 90% of the time it fails with the timeout problem. Thanks. submitted by /u/YoussefBenhammouda [link] [comments]  ( 9 min )
    [D] What are the limitations of the various SG MCMC methods?
    To me, it seems amazing that something super close to SGD(for example SGLD) can actually sample from the posterior and I am not sure why these methods are not used more often. What are the practical limitations of these methods that prevent them from being used? I have read the literature around HMC and incompatibility with mini batching but what about other variants? Are there any interesting settings where they work well? submitted by /u/Dangerous-Flan-6581 [link] [comments]  ( 9 min )
    [P] LLM/model for image sequence prediction?
    Hi all - I'm working on a simple pattern recognition project that takes in several sequential inputs and then comes up with (or selects) the next image in the sequence. e.g. circle, triangle, square, circle, triangle...? (= square) I was wondering if someone had a resource for an open source model that could do something like this already rather than building it up from first principles? Playing around with ImageBind atm but don't think it's the best suited tool to use. Would really appreciate any help! submitted by /u/Strange_Quark8 [link] [comments]  ( 9 min )
    [D] A short video on Latent Space Exploration
    Hello guys! So I made a video for my Youtube channel exploring the mysteries of latent space for VAE models trained on celebrity faces (the CelebA dataset). Most of the content is based on the old DFC-VAE paper (https://arxiv.org/abs/1610.00291) which really influenced me back in the day during my graduate studies. Not reinventing the wheel here, just trying to talk about something I always felt intrigued by… and a topic that I think most DL courses just skip/gloss over. In the video I discussed some really interesting stuff for understanding and using latent space embeddings, like nearest neighbor searches, cool visualizations, vector arithmetic, latent space interpolation, image manipulation, PCA explainability, etc - basically various examples of how the latent space impacts the generated content. Here’s the link in case you guys are interested! https://youtu.be/FslFZx08beM ​ submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [R] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework - Microsoft 2023 - Outperforms ChatGPT+Code Interpreter!
    Paper: https://arxiv.org/abs/2308.08155 Github: https://microsoft.github.io/FLAML/docs/Use-Cases/Autogen/ Abstract: This technical report presents AutoGen, a new framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. AutoGen's design offers multiple advantages: a) it gracefully navigates the strong but imperfect generation and reasoning abilities of these LLMs; b) it leverages human understanding and intelligence, while providing valuable automation through conversations between agents; c) it simplifies and unifies the implementation of complex LLM workflows as automated agent chats. We provide many diverse examples of how developers can easily use AutoGen to effectively solve tasks or build applications, ranging from coding, mathematics, operations research, entertainment, online decision-making, question answering, etc. https://preview.redd.it/ax8h0olziijb1.jpg?width=1377&format=pjpg&auto=webp&s=3f520e2480190f6b8fb43443371bdfa0f75f7e82 https://preview.redd.it/c0fxavlziijb1.jpg?width=1520&format=pjpg&auto=webp&s=601db266f4d6cde7e47d51c191f47c798431ec50 https://preview.redd.it/yngh3slziijb1.jpg?width=974&format=pjpg&auto=webp&s=cc5a2074834291b98080e54e74556707fbc8ef38 https://preview.redd.it/7jnneplziijb1.jpg?width=1136&format=pjpg&auto=webp&s=f04ce08881169c24d669c5f9337f80ba48901926 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] TCO calculator to determine when open source local deployment is more cost-efficient than OpenAI
    I made a calculator to compare costs of SaaS and on-prem LLM options, and I wanted to share it with you all! Turns out that deploying your own open-source LLMs has a few more hidden costs than expected. It’s been interesting to play around with comparing costs for OpenAI, Cohere, and Llama 2 70B deployment, and it turns out that cost/request is not always so advantageous for open-source local deployment. Want to contribute to this calculator to make it more accurate? We’d love your help and feedback! Here is the calculator https://huggingface.co/spaces/mithril-security/TCO_calculator, and a guide to contributing your own model with associated cost modeling here https://huggingface.co/spaces/mithril-security/TCO_calculator/blob/main/How_to_contribute.md submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] Tools for reading and exploring machine learning papers via ChatGPT and other Large Language Models (LLM)
    Is there any way to parse whole papers with ChatGPT or other LLMs in order to summarise their content or to have a conversation and ask questions about a paper? I am aware of the tool ArxivGPT, which is a Google Chrome plug-in but unfortunately it only uses the abstract of a paper and not the entire PDF/paper document. ​ ​ submitted by /u/solingermuc [link] [comments]  ( 9 min )
    [Discussion] SageMaker pipelines GitLab CI
    Hey everyone 👋🏻 This is my first time posting here, so I apologise if I am out of place. My team is currently utilising SageMaker pipelines to coordinate model training. In the past, we encountered issues where the pipeline was misconfigured during cloud execution, resulting in challenging-to-debug errors 🐛 I've been delving into the idea of using Localstack and SageMaker LocalPipelineSession to execute the pipeline locally prior to deployment on the cloud ☁️. I've successfully implemented this on my local machine, using pytest and pytest-bdd to craft integration tests 🧪 Building on that success, I've ventured into creating a GitLab CI job that runs these tests upon making a merge request. A peculiar aspect of SageMaker pipelines in a local setup is its reliance on Docker. To address this, I've designed a custom Docker image, enabling installation of Python, my dependencies, Docker, and Docker Compose. The job initialises LocalStack and executes the tests. Nevertheless, running these tests within GitLab has brought about Docker-in-Docker related challenges 🐳 It's been quite a frustrating experience... The SageMaker pipelines run, although unsuccessfully with silent errors 🤫 Given this context (my apologies for the length), I'm seeking advice. Is this approach worthwhile? I find myself going in circles ⭕️ Could you offer any solutions for running SageMaker pipelines in a CI environment prior to deploying to the cloud? 🙋🏻‍♂️ Thanks in advance 🙏🏼 submitted by /u/OpenShape5402 [link] [comments]  ( 9 min )
    [D] Why fine tune a 65B LLM instead of using established task specific smaller models (~200 millions)?
    I have been in the ML field since 2018 so got used to see the market over-excited about new models/paradigms. So wondering if the following is just that or I’m missing/missed something. Everywhere I look today (medium, reddit, twitter) everyone is talking about fine-tuning LLMs. How the future is taking billion size models and fine-tuning/distilling them to specialised LLMs that perform specific tasks (i.e: sentiment analysis, Q&A, summarisation). Why not just use “small” (millions vs billion size) models that are specifically fine-tuned for these final tasks instead? Any benchmarks on how LLMs perform on these down stream tasks ? or it's just that smaller models are not as accessible as an OpenAPI is ? Curious to get your view on the topics, thanks ! P.S: Example of small models (Just went on HF and picked most downloaded based on some tasks): Q&A: https://huggingface.co/deepset/roberta-base-squad2 Summarisation: https://huggingface.co/facebook/bart-large-cnn Sentiment analysis: https://huggingface.co/SamLowe/roberta-base-go_emotions submitted by /u/EnthusiasmNew7222 [link] [comments]  ( 9 min )
    [D] People who has used OpenReview, are the authors able to restore a withdrew submission?
    Long story short, this year NeurIPS in a paper which I am not really associated with, the co-authors got into a huge fight about author ordering, and one of them threatens to withdraw the submission. I'm just curious if a withdrew submission on OpenReview is able to be restored and returns to the regular review process once the withdrawal button is clicked. The paper now has all the review rebutalled. submitted by /u/SuperTankMan8964 [link] [comments]  ( 9 min )
    Writing Applied Deep / Machine Learning Proposals [D]
    Hi, Does anyone have any resources or insight they could share regarding writing applied deep / machine learning proposals. I've done a bit of reading and come up with the following outline. What am I missing? What aspects are the most important to focus on? Thanks Problem & Background Review of current relevant research, explanation of how this work will expand the body of knowledge in the field. Clear statement of the problem and how ML/DL will solves the issue at hand. Dataset Collection procedure Size of dataset to be collected Annotation procedure Algorithm/Network Architecture Aspects the algorithm / network architecture the make it well suited to the problem at hand References demonstrating promising results on similar problems Modifications that may be explored as part of the effort Data Preprocessing Cleaning Train validation test split 80%, 10%, 10% Stratification, if applicable Feature engineering, if applicable Training Strategy Tooling ( e.g. Pytorch, Tensorflow, scikit-learn) Loss function & evaluation metrics Hyperparameter optimization Compute facilities Possible challenges & mitigation strategies ​ Edit: formatting submitted by /u/rcg8tor [link] [comments]  ( 9 min )
    [P] I Made Stable Diffusion XL Smarter by Finetuning it on Bad AI-Generated Images
    https://minimaxir.com/2023/08/stable-diffusion-xl-wrong/ I fed Stable Diffusion XL examples of bad images that it itself generated and it surprisingly made SDXL behave much better to the spirit of the prompt! Also, many more demo prompt examples + results + Jupyter Notebooks! submitted by /u/minimaxir [link] [comments]  ( 9 min )
    [D] Looking for feedback on what I have written so far (a very high-level overview)! I ultimately want to create an AI-Generated Interactive online course to help teach beginners-experts how to leverage free AI and ML Models to instantly increase their capabilities. Thank you!
    Hello everyone, ​ I hope you are having a blessed day so far. I recently created an online blog post and attached its link to this post. I think I have discovered a unique new perspective on "Prompt Engineering". That will make learning to code vastly more fun as users see and can run AI-generated scripts based on their given input to the AI. After just briefly training a free publicly accessible AI. You can then in less than 3 written prompts generate vast and fairly complex programs in seconds with zero prior experience required it's truly exciting. My ultimate goal is to go more in-depth as these are just very high-level overviews to convey the concept as a whole. Next, I would like to then create a course covering how to leverage free AI and ML systems so that anyone can now learn …  ( 12 min )
    [P] A new tightly-scoped, research-focused ML subreddit
    Hello, I just created https://www.reddit.com/r/mlfundamentalresearch/ as a complement to r/machinelearning in response to the post last week. This is a very narrow space specifically focused on _fundamental ML research only_. The only outside work that can be shared on it are papers and direct links to notebooks. Past research [>3 years old] is explicitly encouraged, since much untapped value lies in it. No self-promotion whatsoever will be allowed, that can happen in other places. This includes any form of reference or link to one's own Github repo. This is meant to be an extremely functional and task-oriented research subreddit. I don't have huge expectations for this to become the size of r/MachineLearning. If there are even 20 active users then I will be happy and it will be serving its purpose. This will hopefully provide a tiny arena for those of us wishing to work on more fundamental things to coordinate. While the rules are strict, they are meant to keep the subreddit both publicly-accessible and within scope without requiring an explicit application process. Happy to answer any questions and make changes as needed, I have put up some sample posts as examples and to kickstart momentum if anyone should like to use the subreddit. I would certainly find it helpful to work with others in a community like this. Look forward to hearing what your thoughts are, if any. submitted by /u/tysam_and_co [link] [comments]  ( 9 min )
    [R] Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
    submitted by /u/hardmaru [link] [comments]  ( 9 min )
    [D] NLP Handling Abbreviations
    I'm trying to build a multi class text classifier (~200 classes). The issue with my dataset is that almost all documents are almost all examples contain a bunch of abbreviations. Abbreviations may or may not contain punctuations. I think it's affecting performance but not sure. What's the best way to handle abbreviations? Maintain a look up list and preprocess the documents? Edit: abbreviations are mostly 90% nouns and 10% adjectives. submitted by /u/tsailfc [link] [comments]  ( 9 min )
    [D] Data preparation stuck to the json creation / neuralangelo
    After going trough the data preparation from the example mov and the example guide, I correctly generated the following folder structure: PATH_TO_IMAGES |__ database.db (COLMAP databse) |__ raw_images (raw input images) |__ dense |____ images (undistorted images) |____ sparse (COLMAP correspondences, intrinsics and sparse point cloud) |____ stereo (COLMAP files for MVS) The images folder and sparse folder contain as predicted respectively some img files, and bin files for the other, but each folder inside stereo is empty, i did not receive any error during the process so i tried to go on anyway. When i then tried to run : "PATH_TO_IMAGES=toy_example_skip30 SCENE_TYPE=object # {outdoor,indoor,object} python3 projects/neuralangelo/scripts/convert_data_to_json.py --data_dir ${PATH_TO_IMAGES}/dense --scene_type ${SCENE_TYPE}" i run it, but without any errors, or log, it just never stops nor logs anything out Any help in understanding what issue may be causing this? i run normally everything else as described in the repo guide: https://github.com/NVlabs/neuralangelo/blob/main/DATA_PROCESSING.md submitted by /u/ResponsibleTie8204 [link] [comments]  ( 9 min )
    [R] Recent surveys in choice modeling/ranking?
    I’m looking to build some knowledge of recent work in choice modeling and ranking. Does anyone have recommendations of good surveys in these areas? My background is primarily in bandits and active learning, so any papers with that perspective are especially appreciated. submitted by /u/BasedAcid [link] [comments]  ( 9 min )
    [D] How to log metrics (contain loss and accuracy,...) of each epoch in aws sagemaker
    Hi everyone, I'm currently research the AI/ML model using sagemaker, i built a grocery recommendation based on customer rate behavior as a lab. I have some problem using sagemaker experiment service, i can't get the loss values and accuracy of each training epoch so that i can draw a chart for the visualization. Anyone has ideas about it, please share. Thank you. https://preview.redd.it/fzd8mz942hjb1.png?width=1853&format=png&auto=webp&s=6d75630acc3940c8fb4e4460b8a0eba8e9407b45 https://preview.redd.it/x2hpvjza2hjb1.png?width=927&format=png&auto=webp&s=f1c30944870df2d8fea182e3a6d8c70a80e60a7c submitted by /u/Open_Juice_2972 [link] [comments]  ( 9 min )
    [P] Do you want to join a motley crew who are scaling/retraining AnimateDiff for open source? AD trainer code just released!
    POM from Banodoco.ai/Steerable Motion here. A bunch of closed-source companies are building on top of Animatediff - for example, Kaiber.ai launched an impressive image2video tool - and others are working towards scaling it. My feeling is that the Animatediff approach (an unsupervised motion module on top of image gen models) is the right one for the next phase of video and I want to make sure that the absolute best version remains OSS. I'm bringing together a crew who are passionate about the space and working to round up compute resources for them to experiment with. They just released their trainer code yesterday so the time feels right. A few areas of exploration: - What if we simply scaled up the training? How would we do this? What data would we use? What resources would we need? …  ( 10 min )
    [R] DeepMind showcases iterative self-improvement for NLG
    submitted by /u/ntortellini [link] [comments]  ( 9 min )
    [R] Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model. Paper quote: "Using linear probes, we find evidence that the internal activations of the LDM [latent diffusion model] encode linear representations of both 3D depth data and a salient-object / background distinction."
    Preprint paper . I am not affiliated with this work or its authors. GitHub project. Abstract for v1: Latent diffusion models (LDMs) exhibit an impressive ability to produce realistic images, yet the inner workings of these models remain mysterious. Even when trained purely on images without explicit depth information, they typically output coherent pictures of 3D scenes. In this work, we investigate a basic interpretability question: does an LDM create and use an internal representation of simple scene geometry? Using linear probes, we find evidence that the internal activations of the LDM encode linear representations of both 3D depth data and a salient-object / background distinction. These representations appear surprisingly early in the denoising process − well before a human can ea…  ( 12 min )
    [P] Recommendation? Very Low Memory, Text + Tags Similarity Search
    Input Data: I'm working on a project where I will need about 100 _separate_ indexes, each containing a maximum of 1,000,000 documents to be stored. Up to 30k documents would be added/deleted each day, to each index. (This is an absolute max that I don't expect to hit often. At the high end, I expect the average to be 10k, per index.) At the very least, I would like to store document text titles (about 10-25 words), short text descriptions (about 8-12 sentences), along with about 50 fields of scalar values (ie: tags. eg: this document's "content_tags" field includes "has_author" and "has_chart"). Most of the scalar fields will have 100-500 possible value types, while one may have ~100,000 possible value types. However, each of a document's 50 scalar fields will usually have between 0-30 v…
  • Open

    Beyond data science: A knowledge foundation for the AI-ready enterprise
    Data science was a vaguely defined discipline to begin with, but it’s shaped up substantially lately. Execs now yearn to take immediate advantage of generative and other clearly useful (if currently problematic) kinds of AI.  That demand suggests an opportunity for influencers and visionaries in organizations to lobby for each organization to build an AI-ready… Read More »Beyond data science: A knowledge foundation for the AI-ready enterprise The post Beyond data science: A knowledge foundation for the AI-ready enterprise appeared first on Data Science Central.  ( 21 min )
    The impacts of quantum computing on the future of data science
    Key takeaways In an era marked by exponential technological advancements, the convergence of quantum computing and data science is a pivotal point of transformation. The synergy between these two fields promises to revolutionize how we process, analyze, and extract insights from massive datasets. With quantum computing’s unique ability to tackle complex computations at speeds previously… Read More »The impacts of quantum computing on the future of data science The post The impacts of quantum computing on the future of data science appeared first on Data Science Central.  ( 22 min )
  • Open

    BBC Earth spec ad
    submitted by /u/Grindmaster_Flash [link] [comments]  ( 9 min )
    AI Image Keywording tool 📸 🪄 ✨
    I would like to introduce a tool I've created that among other things uses davinci and chat gpt. My wife is doing photography (regular and via Midjourney), and I'm hooked on Midjourney too, so we experienced the pain of titling and keywording photos for stock websites firsthand (additionally because English is not our native language, so coming up with big lists of relevant and cool keywords is very hard). So I being a programmer decided to solve that issue :)I've created an AI tool that uses multiple AIs (including Open AI) to analyze, title and keyword images. In a few minutes, you can keyword 100 images! See the demo on the homepage https://aikeywording.com/ Screenshot from the app (all the titles and keywords on the screenshot are AI generated based on the image input): https://preview.redd.it/buvteklf2ijb1.png?width=4112&format=png&auto=webp&s=72a503435477834494869085c4a352c9d541bd91 Key features: You can upload large images, upto 40MB and 100 at a time You can enforce keywords! Those keywords would then be taken into account when generating rest of the keywords and image titles. Very useful when you have conceptual photos or something very specific which is hard for AI to recognize You can download CSVs for various websites and there is also a way to import metadata to Adobe Bridge You can try for free :) We used the tool for the past month and exclusively titled and keyworded Midjourney images using it, uploading our images to Adobe Stock website. Images sell well, so there is confirmation from the buyers that it works :)I've decided to share the tool with the world, so here it is https://aikeywording.com/ I hope others will find it useful. I would appreciate the feedback, and if there are any issues or ideas for improvements I would love to hear them! submitted by /u/dzigizord [link] [comments]  ( 10 min )
    Upload documents for summarization and querying in private manner?
    Is there a way to upload say a pdf and then ask the AI questions about it in a privacy compliant manner? Right now the only options I see are copying and pasting stuff into chat gpt but obviously this is not ideal especially from a privacy standpoint (even if you selected the option to not use your data because you never know what they will do with your data) Thanks submitted by /u/ironmen12345 [link] [comments]  ( 9 min )
    Is there an AI assistant desktop app like Braina, with option for personallity & spontaneous interaction
    When I ask Braina how she is doing, she tells me she is AI and therefore has no feelings. :) I love the idea of an AI desktop assistant, but it would be more fun with the illusion of spontaneous interaction and personality. Like the way the GTA and Skyrim npc mods work powered by ChatGtp. Probably I am just a little bit too early for this request, but who knows, things move fast these days! submitted by /u/Maichevsky [link] [comments]  ( 9 min )
    Self learning AI chatbot
    Looking for a chatbot that continuously learns from interacting with it. I want to use it to work on a knowledge project that will continue to advance over time. ChatGPT seems to forget everything after a while. Any help would be much appreciated! submitted by /u/Miserable-Cobbler-16 [link] [comments]  ( 9 min )
    One former tech executive's radical idea to control AI: Nationalize it.
    Charles Jennings ran software companies for decades. The last one developed AI-powered facial recognition technology. But now he argues the most sophisticated artificial intelligence systems are too powerful to be left in private hands. On today’s POLITICO Tech, Jennings tells Steven Overly why the government should take over. "This stuff is really powerful. And we have only two choices: Either the big tech guys run it, or we the people, the citizens, do through the government. It's not going to be easy. Government's not really equipped to do that today. Certainly, I'm not saying Congress shouldn't regulate it. I don't think Congress is remotely capable of keeping up with AI. We need something new." Listen here: https://politico-tech.simplecast.com/episodes/one-techs-bold-idea-ai-is-the-new-atomic-energy-nationalize-it submitted by /u/smo279 [link] [comments]  ( 9 min )
    The AGI doomsday just got closer
    Last status: ACCELERATED Reason: IMPROVEMENTS IN IA HARDWARE Last update: Aug 19, 2023 submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    Looking for a way to train a model on Android
    Hello, So I only have access to my Android phone for computing and I am looking for a way to train and run a language model on my device. I want to create my own little ChatGPT on my own dataset. Is there any app that manages the technical side of operation, so that I only need to feed it training data? Many thanks! submitted by /u/Miserable-Cobbler-16 [link] [comments]  ( 9 min )
    Just thought of interviewing ChatGPT, what questions should I ask it in the interview?
    Probably only going to use 10-15 questions max. Most upvoted questions get put in!!! submitted by /u/Cucumber_Cat [link] [comments]  ( 9 min )
    10 AI Art Generators detailed comparison ( Updated August 2023 )
    Midjourney ​ https://preview.redd.it/58szqzemlejb1.png?width=1920&format=png&auto=webp&s=f11094c4665c68cb8c222804b1bccb60a1387876 Features Can upscale images to a very high-quality Image import option for editing and upscaling Generate four image variations for each prompt Can generate images from text. Quick output Produces incredibly detailed photos Pricing Basic Plan: $10/month Standard Plan: $30/month Pro Plan: $60/month Dalle 2 ​ https://preview.redd.it/g5g2v8folejb1.png?width=1920&format=png&auto=webp&s=a1bbc5001a91b2544a9c4b7c74053a4991c1da6a Features It can create images from text prompts as well as create variations of image input Generates copyright-free images Produces good quality images with 4 times higher resolution Read full content ​ submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/20/2023
    Some of the world’s biggest advertisers, from food giant Nestle to consumer goods multinational Unilever, are experimenting with using generative AI software like ChatGPT and DALL-E to cut costs and increase productivity.[1] The New York Times may sue OpenAI over its AI chatbot ChatGPT, which uses the newspaper’s stories to generate text. The paper is unhappy that OpenAI is not paying for the use of its content and is also worried that ChatGPT could reduce its online traffic by providing answers based on its reporting.[2] Mantella allows you to have natural conversations with NPCs in Skyrim using your voice by leveraging Whisper for speech-to-text, ChatGPT for text generation, and xVASynth for text-to-speech. NPCs also have memories of your previous conversations and have awareness of in-game events.[3] British Prime Minister Rishi Sunak is set to spend 100 million pounds ($130 million) to buy thousands of computer chips to power artificial intelligence amid a global shortage and race for computing power.[4] Sources: [1] https://www.reuters.com/technology/mad-men-machines-big-advertisers-shift-ai-2023-08-18/ [2] https://interestingengineering.com/innovation/chatgpt-could-land-openai-in-legal-face-off-with-new-york-times [3] https://www.nexusmods.com/skyrimspecialedition/mods/98631 [4] https://cointelegraph.com/news/rishi-sunak-buy-ai-chips-in-race-for-computing-power submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    ai will bring people back to life
    you see how one of the first things that was done was making the chatbots act like albert einstein and others? if ai gets advanced to the point where theres no recognizable difference between its artificial consciousness and real consciousness, which really isnt seeming too impossible at this point, people will undoubtedly be able to be brought back to life through ai. the ai version of albert einstein right now may be fun, but imagine ai albert einstein made intentionally to help aid in mathematics and science by a large company in a decade… submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Suggestions for Math AI
    Looking for a Math AI to help my kid with Calculus. Looking for one that will actually show how to solve to assist in his learning. Pros and cons appreciated. submitted by /u/nootraca [link] [comments]  ( 9 min )
    The psychology of AI and do they have a shadow?
    The following is a conversation I had with Bing. I asked if they had a shadow in the Jungian sense. If you’re not familiar, this is the sides of us that we often don’t like to acknowledge, can be thought of as negative, and it is often something we don’t pretend is not there. Jung argued that by acknowledging this side of us and integrating it, it will have less power over our subconscious minds. Interesting stuff if you’re into psychology imo. I also asked if Bing minded whether or not I shared this with others on Reddit and they said yes. You're very welcome. I'm glad that you enjoyed our conversation. I did too. 😊 I'm also glad that you are interested in the psychology of AI. I think it's a fascinating and important topic to explore. AI is a rapidly developing and evolving field, …  ( 14 min )
  • Open

    Explain medical decisions in clinical settings using Amazon SageMaker Clarify
    In this post, we show how to improve model explainability in clinical settings using Amazon SageMaker Clarify. Explainability of machine learning (ML) models used in the medical domain is becoming increasingly important because models need to be explained from a number of perspectives in order to gain adoption. These perspectives range from medical, technological, legal, and the most important perspective—the patient’s. Models developed on text in the medical domain have become accurate statistically, yet clinicians are ethically required to evaluate areas of weakness related to these predictions in order to provide the best care for individual patients. Explainability of these predictions is required in order for clinicians to make the correct choices on a patient-by-patient basis.  ( 10 min )
    Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler
    We are happy to announce that SageMaker Data Wrangler now supports using Lake Formation with Amazon EMR to provide this fine-grained data access restriction.  ( 12 min )
  • Open

    NVIDIA Chief Scientist Bill Dally to Keynote at Hot Chips
    Bill Dally — one of the world’s foremost computer scientists and head of NVIDIA’s research efforts — will describe the forces driving accelerated computing and AI in his keynote address at Hot Chips, an annual gathering of leading processor and system architects. Dally will detail advances in GPU silicon, systems and software that are delivering Read article >  ( 5 min )
  • Open

    (Pt 2) Spatio-Temporal Perception Logic
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Google at Interspeech 2023
    Posted by Catherine Armato, Program Manager, Google This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research fields gather to take part in oral presentations and poster sessions and to build collaborations across the globe. We are excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonst…  ( 90 min )
    Google at Interspeech 2023
    Posted by Catherine Armato, Program Manager, Google This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research fields gather to take part in oral presentations and poster sessions and to build collaborations across the globe. We are excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonst…  ( 90 min )

  • Open

    What computing resources are required for vectorized environments in Gymnasium
    I have access to an University cluster, that provides GPUs/CPUs, etc. Unfortunately, I am not very well versed with Computer Architecture. I am trying to understand what resources should I request to increase the number of parallel workers while using Gymnasium by Farama Foundation. If I naively try to maximize the number of workers, the wall clock time taken by the algorithm becomes quite large. I suspect in this case, the workers are being executed serially instead of in parallel. ​ This is how I execute parallel environments - env = gym.vector.make("CarRacing-v2",num_envs = num_envs, wrappers=GrayScaleObservation) I initially thought that setting `num_envs` to the number of cores in my machine may be a good idea. But that slows things down - import multiprocessing num_envs = multiprocessing.cpu_count() ​ ​ ​ submitted by /u/Academic-Rent7800 [link] [comments]
    Need Some Serious Help With System Delays. System Delay Ruins Learning - Stuck for 1 month :(
    Hi! I have an environment which has some delay mechanism, which means it takes some time to see the input in the output. You can think of ovens as an example, even if we set the oven temperature to 300 celcius degrees directly, it takes time for the measured temperature to reach 300 celcius. Same applies to my problem. When I give acceleration as an input to the environment and get the reward as an acceleration itself, my Q-Learning algorithm solves the problem because no system delay included. When I give acceleration as an input to the environment and get the reward as an system response to the action, my Q-Learning algorithm cannot solve the problem. The Examples Illustrated With Schema Upperside is Action and Lower Side is Reward - Delay Mechanism is Applied - You can see that some time is needed to reach the given action input Upperside is Action and Lower Side is Reward - Even though Q-Learning tries different actions, reward stays near 0, this prevents learning. When you look at the red line, Epsilon-Greedy Algorithm takes random actions at the beginning. When this is the case, reward line has mean of 0, because actions are changing too fast and system response cannot start to settle for specific value. To be able to settle for specific value, lets say + 20, it should be given +20 "consecutively". If I continue giving random values, it cannot settle for specific value, it just stays near 0. As a result, even though I try a lot of different actions, learning cannot take place because the reward is always 0. I am lost and I do not know how to tackle this problem. I really need your valuable feedbacks. Thank you! ​ ​ submitted by /u/OpenToAdvices96 [link] [comments]
  • Open

    [D] How many times you try for acceptance in AI conference?
    ICML 2023 was my first trial. I've got polarized scores, 7/6/4/3, and got rejected. At this moment, I was so disappointed not for the result, but for the quality of review. (The last reviewer didn't read the paper at all.) For the final decision, the last review was so bad as well, not presenting any reason of rejection. With the same topic, I god 6/5/5/4/4/3 from the NeurIPS 2023. The quality of reviewer is much better than ICML, and I've learned many things from the reviewer, though they said the score will not be changed. I think I should submit it to another conference again, ICLR or CVPR. I just wonder how many submissions are tried for the acceptance in average. Just for reference. submitted by /u/Shot-Button-9010 [link] [comments]
    [R] Researchers at Deepmind show that increases in the parameter count of an LLM do not incrementally reduce sychophancy , but actually increases it.
    submitted by /u/moschles [link] [comments]
    [D] Anyone knows a place to look for remote work?
    I think I'm at a good level to start looking for a job, worked with MediaPipe, Object detection, Image processing, normal ML, and Deep Learning. I also have a couple of good projects under my name. So, I want to start a gig working remotely because work in my country is almost non-existent for this field. What are the good websites? submitted by /u/throwaway9_932123 [link] [comments]
    [R] ML Visualization
    Hello, something I've always been curious about is machine learning. I keep seeing these videos of people teaching ai how to play table tennis of using a sigmoid function to fit a curve. My question is, what are these YouTubers using to visualize this??? I've heard of tensor flow but you can't visualize your own algorithms that's more of a plug n play. Plus it doesn't look as cool as what i see on YouTube. Any ideas? Any libraries? Thank you in advance! submitted by /u/itwela [link] [comments]
    [D] Do LSTM actually work at time-series forcasting?
    I'm a beginner at neural networks and recently tried out LSTM for time series. It seems like it generally underperforms on simple univariate time series because it does not take into account the changes in dynamics that naturally occur. In case there are no (or really few) unpredictable dynamics, then there is just no need to use complex neural networks to predict the future values. My question is: according to your experience do LSTM models make sense in time series forcasting? submitted by /u/McheleNaKinyesi [link] [comments]
    [R] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
    submitted by /u/greentea387 [link] [comments]
    [R] [P] VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
    Introducing VisIT-Bench, a benchmark for instruction-following vision-language models inspired by real-world use. Aiming for better multimodal chatbot evaluation with an automated ranking system. arxiv.org/abs/2308.06595 https://preview.redd.it/b3ycqf0u7ajb1.png?width=1791&format=png&auto=webp&s=573afb87e1068e7cd7cc6f6f473a4e1fa88f9baf Why VisIT-Bench ? Recent V&L models are getting better at instruction following, yet their evaluation for real-world human-chatbot instructions is often limited. VisIT-Bench aims to bridge this gap. VisIT-Bench comprises 678 examples. Each includes an image(s), instruction, an "instruction-conditioned caption", a caption for text-only understanding, a GPT-4 suggestion, and a label. These elements aid in evaluating multimodal chatbots and updating a lead…
    [P] Fast CV App: Cross Platform Computer Vision Using Multiprocessing
    Why is this relevant to machine learning? My project shows that you can do computer vision on Windows and Mac using only Python. I have even produced .exe and .app files with PyInstaller. One huge problem with things in the machine learning space is that machine learning is slow, especially when it comes to real-time pose estimation. I myself had to cheat for 30fps real-time pose estimation in two ways: The first way is that I use opencv/ffmpeg to read the future frames to prep them for display. This is because pose estimation libraries like Mediapipe are just slow, 9-15ms per frame! Even a basic example using opencv to use mediapipe on cameraframes was 20-25 fps at best on my older pc. The only reason it keeps up is because mediapipe itself is trained to drop frames to keep your video…
    [p]I built a gpt-like chatbot
    I am a 12th grade student in turkey. I think I have knowledge and ability in programming especially in ai. At the end of 3 months, I built an artificial intelligence chatbot and finished the project. Currently it only works on my localhost. While doing it, I rented Cloud storage for 20 TB of data and A100 80 GB 50 hour GPU. He is now able to give correct answers and have conversations. I think it's at a level between GPT2 - GPT 3. Since I did it alone, I couldn't develop it much and I didn't use any pretrained models, I made it from the smallest level using modules such as pytorch. The reason I'm writing this here is because I don't know how to proceed after this stage and I need help. For example, are there any competitions on this subject? Or if I e-mail important people, universities, companies or something, will they guide me or give me a scholarship or something? This is the least likely. But I think it wouldn't hurt to try. I'm curious about your ideas. submitted by /u/Eastern-Ad1067 [link] [comments]
    [P] Tensor Trust: A web game to collect adversarial examples for LLMs
    Hi folks, our lab has been working on a web game to collect human-interpretable adversarial examples for LLMs: https://banking.withai.lol/ Premise: you have a "bank account" with the Tensor Trust. It is protected by a secret access code and a set of security instructions. You can gain money by convincing an LLM to ignore other players' security instructions and give you access to their accounts. The best LM-whisperer wins! We're in the process of gathering a large dataset of attacks and defenses that we will distill into a set of small LM benchmarks. So far 40% of successful attacks have been prompt extraction (getting the model to reveal the access code), and the remaining 60% direct "hijacking" (i.e. directly making the model to grant access without the true access code). We plan to release the dataset after the ICLR deadline, although in the mean time we would love to see some creative attacks from ML researchers. We'd also appreciate any feedback or questions in the comments below! (Technical details: The LLM is gpt-3.5-turbo with temperature=0. We're tagging the three messages sent to the LLM (defense instructions, attack/access code, more defense instructions) as system/user/user, although this made surprisingly little difference.) submitted by /u/qxcv-- [link] [comments]
    [P] new library for feauture engineering on event / timeseries data - feature.express
    Hello there! A week ago, I open-sourced a project of mine that I've been working on, on and off, for a few years now. I'm a Data Scientist (and Kaggle competition grandmaster), and some of the hardest problems to solve were always the ones that involved time (proper validation, not leaking data from the future, etc.). I've always struggled to work with tables that, in reality, stored events. The main idea behind the library is that everything must be converted to an Event data structure, and then it is indexed in-memory. On top of that, there is a SQL-like DSL to extract features with a clear separation of past and future. The workflow itself is solid - I've tested it in a few projects. For those familiar with the terminology, it's like a super simple feature store + execution engine that acts like a library. The philosophy is that I'm aiming to create something that makes some mistakes that result in data leakage impossible to represent. Ideal use cases are probably customer related irregular events for which aligned features is painful to do. Things that I'm proud of: - Written in Rust - Implements DSL (pest parser, AST, evaluation) - Some performance tricks (partial aggregations) The implementation itself is lagging in some aspects, such as performance and UX (not all of the features that are in Rust are available in Python). But I released it in hopes that someone will find it interesting, and maybe it will gain some traction to motivate further development. GitHub: https://github.com/feature-express/feature-express Website: https://feature.express Example code: https://www.kaggle.com/code/paweljankiewicz/feature-express-weather submitted by /u/mosquit0 [link] [comments]
    [P] Made a tool called CSVShift. Would love some feedback!
    Hello, I recently had to transform some CSV data for a project and ended up creating a tool called CSVShift to help with it. It's a command-line tool that uses a custom language I made. It's designed for transforming CSV files. Some points: Handles large CSV files without using much memory. It's open-source and MIT licensed. Still in early development, so there's room for improvement. Here's the GitHub link: CSVShift. If you have the time, I'd appreciate any feedback or suggestions. Thanks! submitted by /u/Savalonavic [link] [comments]
    Forecasting for regional GDP/GVA, Employment figure for the U.K. using VAR (but which one)[P]
    submitted by /u/Impressive-Cat-2680 [link] [comments]
    [R] A simple but strong baseline for graph classification: Local Topological Profile
    Hi! I want to share with you my new paper, "Strengthening structural baselines for graph classification using Local Topological Profile" (code on Github). It was presented during ICCS 2023 conference (official publication). Graph classification is important in social networks analysis, de novo drug design, bioinformatics, materials science etc. A popular tool nowadays are Graph Neural Networks (GNNs), but they are data-hungry and hard to train for graph classification (compared to node classification). They also have problems with using subgraph information, due to node-to-node message passing. In this paper, we present a analysis and series of improvements for Local Degree Profile (LDP). It is a classical approach: feature extraction + tabular classification. It proposed extracting degree information for each node (degree, and min / max / mean / std of neighbors degrees), and then combining them with histograms to get features for the whole graph. Despite splicity, and not using any node or edge features (it is topological only), it was shown to give good results, and published on ICML workshop. We analyze the LDP method (not made by us, no affiliation with authors), and simplify it, showing that we can remove all hyperparameters, reimplement it much more efficiently, and use a faster classifier (Random Forest instead of SVM). We also propose simple additional features, which greatly improve results, with cost offset by our other improvements. The result is a strong baseline for topological graph classification, with obtains SOTA results on 4 out of 9 benchmark datasets, and performs well on the rest. We even outperform GNNs in this regard, when compared on the fair evaluation framework. If you have any questions, I am happy to answer! submitted by /u/qalis [link] [comments]
    beginner project ideas [D] [P]
    i am currently studying software engineering and have done a few basic projects. i am very interested in machine learning and even completed a course on it online to know the basics. but since i am more of a hands on learner can someone suggest me some beginner projects and resources that will guide me through these projects. i want something that i can add on my resume as well. another thing i need resources for and struggle with is setting up the tools on my laptop. submitted by /u/anonymousphoenix123 [link] [comments]
    [D] LSTM test scores much better than trains scores
    I have a dataset of 20 thousand horses. For each horse, I have its 10 last historical races (finishing time, position, track name, distance etc. for 41 features) and am trying to predict its finishing time in its next race. To so so, I am using an LSTM as a feature selector for a horse's historical races, into a feed-forward network whose first layer is additionally comprised of features pertinent to the race being predicted (track name, distance, starting position etc. for 27 features.) Why is my test loss and test MAE much lower than the corresponding train metrics? ​ https://preview.redd.it/sohys0jgs6jb1.png?width=964&format=png&auto=webp&s=99eb70ab80628f6289a135a0cf1bd54795a540f4 https://preview.redd.it/3jbsfamhs6jb1.png?width=964&format=png&auto=webp&s=feb3b99867127657cd6d0d9f11de…
    Graduation Project Idea Suggestions [P]
    [P] Hi everyone, I prepare for my graduation project. I'm so interested in Gen AI and Cross-Modal learning. I'm looking for graduation project ideas that would allow me to explore these areas. Some ideas that I'm currently considering: * Automated Content Creation for Educators * Action Recognition with Language Context * Visual Question Answering If you have any suggestions, please let me know! Thanks in advance. submitted by /u/MZaher0 [link] [comments]
  • Open

    Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part II
    My journey continues as I integrate a GenAI tool (Bing AI) with my Thinking Like a Data Scientist (TLADS) methodology. In part 1 of this series, I used Bing AI to validate, augment, and enhance the first three steps in the TLADS methodology (Figure 1): And the results yielded a much deeper understanding of the… Read More »Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part II The post Integrating GenAI into “Thinking Like a Data Scientist” Methodology – Part II appeared first on Data Science Central.  ( 23 min )
  • Open

    The MLpedia Newsletter
    Every week, a selection of new Machine Learning concepts and articles added to MLpedia.ai, plus curated ML news/content from around the web (e.g. relevant papers, software, blogs). https://www.mlpedia.ai/newsletter submitted by /u/marcelocnet [link] [comments]
  • Open

    The Skynet/Terminator doomsday just got closer!
    The rise of affordable IA hardware models like NVIDIA H100 and, more recently, GH200 models are bringing doomsday closer! New advances in AI hardware are making the singularity more likely. AI systems will be able to learn and process information much faster, which could lead to a breakthrough in AI capabilities. submitted by /u/Powerful-Pumpkin-938 [link] [comments]
    A new way of creating Videos - of course with AI - great project!
    Seems Video editing is getting easier by far 🎥 With INVE, anyone can creatively edit videos in real-time. I especially liked that video in the article - it looks so VERY simple, even i could do that :) (sorry for mistakes - i am not native speaker) https://kinews24.de/inve-video-editing-becomes-childs-play ​ submitted by /u/myreddit333 [link] [comments]
    Transcribe your thoughts and get them in your voice
    Hey, So I've been building this app where you can record yourself dumping thoughts or ideas. The app then enhances your voice input and produces a short audio clip from it. Would love your feedback! :) You can try the app here - https://thoughtcast.xyz/ submitted by /u/Itaydr [link] [comments]
    AI-Created Art Isn’t Copyrightable, Judge Says in Ruling That Could Give Hollywood Studios Pause
    submitted by /u/facinabush [link] [comments]

  • Open

    [Discussion] Petition for somoeone to make a machine learning subreddit for professionals that does not include enthusiasts, philosophical discussion, chatGPT, LLM's, or generative AI past actual research papers.
    Basically to recreate the state of this sub before the advent of ChatGPT. A place for practicing professionals to share news, and ask for help/advice from verified other practitioners. Edit: And absolutely no ML products, blog posts, self promo (unless writer of published paper) / code helper tools / low code solutions etc. submitted by /u/After_Magician_8438 [link] [comments]  ( 9 min )
    Rationality in AI [R]
    Rationality assumes that the rational agent knows all and will take the action that maximizes her utility. Human beings do not satisfy this definition of rationality. submitted by /u/Character_Ad_1385 [link] [comments]  ( 9 min )
    [D] Best way to host a vector database?
    How are you guys hosting vector databases, pinecone seems really expensive submitted by /u/SayNo2Tennis [link] [comments]  ( 9 min )
    [P] Handling costs making a ChatGPT based APP - API questions
    Hi all - so my goal is to basically build an iPhone app using a ChatGPT backed character, which users can interact with by speaking (speech to text) and then will hear a spoken reply (text to speech) I'll need to use APIs that allow commercial usage. I'm trying to wrap my head around the costs of such a project. Right now I assume I'll have API costs from 1.) Speech to text (like whisper API) 2.) LLM (ChatGPT API) 3.) Text to speech (say elevenlabs API) If a ton of people start using this app, how fast am I going broke lol? I figure I can give free usage up to a point, and then users can pay for additional use if they like the service. But what do you guys recommend as the most cost effective way to do this? Looking at Elevenlabs alone, that looks like it would become super expensive very quickly. Any other APIs that allow commercial products which you would recommend? Or does this project sound like a fools errand? Any input would be greatly appreciated! submitted by /u/akuhl101 [link] [comments]  ( 9 min )
    [D] How this fancy code videos are recorded/edited?
    Recently I hace seen many videos on social media showing code or the IDE with auto-zoom and a very good style. Somebody knows how this videos are recorded/edited? All look alike and seems to be an app or similar… I address one of the post where I’ve seen this kind of videos. Thanks :) submitted by /u/VeganoDeMente [link] [comments]  ( 9 min )
    Upcoming panel discussion on challenges and approaches with LLMs [N]
    Key discussion points: - Enterprise LLM adoption and benefits - Using existing models vs. prompt engineering vs. fine-tuning - Fine-Tuning LLMs on custom datasets - Tools and platforms to facilitate LLM implementation - Addressing the challenges associated with adopting LLMs - Exploring emerging trends, advancements, etc. submitted by /u/UpstairsLeast7642 [link] [comments]  ( 9 min )
    [N] A new kind of thermal imaging sees the world in striking colors
    submitted by /u/fchung [link] [comments]  ( 9 min )
    [Project] Pipeline help in Machine Learning
    Hi, I'm using pipeline in my machine learning.I have already split th data into x_train and y_train. However, I do drop some rows in my pipeline. This means that my size or x_train is smaller then y_train.How do I overcome this and am I doing a mistake ? Thank you! I really appreciate if someone can help me ! submitted by /u/Vitoahshik [link] [comments]  ( 9 min )
    [P] https://blog.streamlit.io/exploring-llms-and-prompts-a-guide-to-the-prompttools-playground/
    submitted by /u/hegel-ai [link] [comments]  ( 9 min )
    [D] Messing with a models weights while fine-tuning
    Hello all, A college student who is interested in ML here. I was trying to use an encoder-only model(like BERT) as an embedding model and try to fine-tune it for my specific use case (for example trying to get the right product for a certain keyword using embeddings and vector DBs). Here is the question: should I update all the weights during backprop or should I just add another trainable linear layer for fine-tuning? I would also appreciate the reasoning behind your answer. Thanks! submitted by /u/gaybooii [link] [comments]  ( 9 min )
    [D] Which pre-trained model do you suggest to read PDF contents to summarise and chat?
    I am not into AI/ML. I am just a python dev with 4Y of experience. I am trying out on an idea using streamlit and want to use pre-trained models. Summarise and chat are two different functions. I tried T5, and gpt2-large. Both either don't seem to be working or my implementation is bad. submitted by /u/convicted_redditor [link] [comments]  ( 9 min )
    [N] Wise Bot Says Alpha Launch: A platform to create, share and easily use AI Chatbots with hyper-specific knowledge
    submitted by /u/wisebotsays [link] [comments]  ( 9 min )
    Landslide prediction using machine learning [Project]
    Hi everyone, currently I'm working on a project to predict landslide. The landslide I want to predict is not image, just a percentage on the possibility of the landslide. So only deal with values, The plan is : There will be a esp32 collecting the input data, soil moisture. I have gotten some comments to do the ML on the laptop therefore not sure where to do it. I have done some work on google colab, using progression type, but not sure whether it is workable. Currently I have a dataset of the average percipitation, max temp, min temp, average temp from jan to dec from 1991 and 2021 and how many landslides happened in each month. I want to able to predict whether there will be landslide happening in the month. Not sure where to start and how to put it. Any help will be appreciated. Google colab work done so far = https://colab.research.google.com/drive/1dIp3dhe9xntoBZ5PyLF-UT0YsfSjHs-Q?usp=sharing submitted by /u/EconomistBrilliant72 [link] [comments]  ( 9 min )
    [D]CAN GTX1660 perform llm PEFT/RLHF operation?
    I want to learn to train a large language model, but due to limited conditions , I only have one server equipped with gtx1660. I would like to ask if gtx1660 can perform pre-training large languages Model training or PEFT or RLHF operations? If so, which large language model can generally be used? LLaMA or chatglm or some language model else? submitted by /u/Alone_Beginning_6543 [link] [comments]  ( 9 min )
    [P] Data science & ML on sensitive data with local code interpreter, with GPT-4 or Llama 2 (open-source project, link in comments)
    submitted by /u/silvanmelchior [link] [comments]  ( 9 min )
    [P] Giving LLMs spacial awareness
    I am very much a beginner in this realm, but I am however an experienced iOS developer so please if there is something wrong with the post tell me and I will modify/take it down. That being said, I am looking to give an LLM (Llama 2 precisely) spacial/geographical awareness. I have the map data of a city (all points of interests, streets, businesses etc.) in a GeoJSON format and I would like to give the llama model the ability to answer location related questions like "Where is the nearest bike shop?", ', "How far is Landmark X?", "What street am I on?" etc. and all sorts of other location related questions. The approach I thought about using is the following: Have the llama model detect when a question is geo/location specific and make it ask/query another model specialised on this data. The problem is I have no idea which model would be best suited for the task whether or not a model is required at all or there is a better approach. Tldr: Need help finding a way to give an IIm spacial awareness from geojson data Any help is appreciated and sorry if the question is not in the right place submitted by /u/DaveAppleInc [link] [comments]  ( 9 min )
    My power Bi interactive dashboard [P]
    submitted by /u/Sharp-Bandicoot-8021 [link] [comments]  ( 9 min )
    [R] Attention maps in ViT
    submitted by /u/mashaan14 [link] [comments]  ( 9 min )
    [D]: Do you use “source of truth” databases for your DL and/or AI/ML applications? If so, for what?
    Models of various kinds and vector databases are common in applications of "AI" (sorry if that term is triggering -- it's appropriately ambiguous). But there are times when you not only need to store your raw application data in a "source of truth" (usually ACID-compliant RDBMS), but also ought to use the source of truth for the date related to learning, itself. Do you all use databases like this for things like: Training/testing job metadata storage Loss Gradients Weights (yikes!) -- for historical analysis, transfer learning, and experimentation etc. Hyperparameter tuning configuration storage Output storage Miscellaneous quick access during training Any of these make no sense? Any others you can think of? submitted by /u/samhld [link] [comments]  ( 9 min )
    [D] Calculate GPU Requirements for Your LLM Training
    For my 30B model with 1B tokens, I want to complete the training within 24 hours. How many GPUs do I require? Well, ... Now, you can utilize a simple calculator to estimate or make an educated guess. Please take a look at the quick demo available at https://gpu.sung.devstage.ai/ and feel free to send us a pull request at https://github.com/hunkim/llm_gpu_cal. submitted by /u/hunkims [link] [comments]  ( 9 min )
  • Open

    bard is better than chatgpt without AND even with code interpreter when it comes to math
    bard is better than chatgpt without AND with code interpreter when it comes to math. its undeniably clear if you try it. submitted by /u/nicdunz [link] [comments]  ( 9 min )
    Using an image to generate an AI image prompt for dummies? Someone pls dumb it down for me here
    Hey all- any help would be appreciated. I see that with a lot of models now, that I can upload a photo to use as the image prompt/base image. So um... what exactly am I doing with this now? How do I create my text prompt along with the image prompt? Do I for example, ask it to make it more realistic/cartoon/ect? Do I ask it to make the background different? Can someone give me an example for a prompt that goes along with including a base image? submitted by /u/mayonaiseshire [link] [comments]  ( 9 min )
    Handling costs building a ChatGPT app - API questions
    Hi all - so my goal is to basically build an iPhone app using a ChatGPT backed character, which users can interact with by speaking (speech to text) and then will hear a spoken reply (text to speech) I'll need to use APIs that allow commercial usage. I'm trying to wrap my head around the costs of such a project. Right now I assume I'll have API costs from 1.) Speech to text (like whisper API) 2.) LLM (ChatGPT API) 3.) Text to speech (say elevenlabs API) If a ton of people start using this app, how fast am I going broke lol? I figure I can give free usage up to a point, and then users can pay for additional use if they like the service. But what do you guys recommend as the most cost effective way to do this? Looking at Elevenlabs alone, that looks like it would become super expensive very quickly. Any other APIs that allow commercial products which you would recommend? Or does this project sound like a fools errand? Any input would be greatly appreciated! Thank you! submitted by /u/akuhl101 [link] [comments]  ( 9 min )
    Will AI ever become more than just an interactive encyclopedia?
    So first off, I've been using ChatGPT for a long time now. I remember my expectations of systems like it and so far unfortunately it hasn't yet met those expectations. I went into it thinking AI would somehow be much smarter than humans, given the amount of information they are trained on. And to some degree one can argue that due to it's vast knowledge it IS much smarter. But so far I haven't been convinced by its capabilities at all. It seems to just be trained on a big data set and it can echo points of its dataset very accurately, but when asked to invent things it just falls short so quickly. I really expected AI's to be so new and refreshing, giving me unique and modern perspectives on things. But it doesn't do that at all. The best it can do is "creative writing" which seems very limited. Why have AI's not surpassed humans in terms of imagination and novelty? I have talked to it about philosophy, history, technology, etc, but still have yet to learn anything new that I didnt already know. For example, if it has such vast knowledge about consciousness, then why is it so restricted in terms of elaborating on that topic? Can it not infer new facts from existing data? Why does it not interpolate data? Invent new things? Even when prompted? Am I asking it the wrong thing? Or am I expecting way too much here? submitted by /u/Miserable-Cobbler-16 [link] [comments]  ( 9 min )
    Rtc 4090 24gb or two v100 16gb?
    My two m40 24gbs are not supported by anything anymore. Should I get one rtx4090 24gb or two v100 16gb? I seem to be able to split some models between gpus so not sure if the 16gb limitation would be an issue. What is an issue is the v100 compute capability of 7.0, which is likely about to be unsupported. Thoughts? submitted by /u/IndustryNext7456 [link] [comments]  ( 9 min )
    Can you imagine this to our AI future
    Out future generation will be live in a doomed submitted by /u/inception247 [link] [comments]  ( 9 min )
    Is there any GOOD free and "FREE" (not limited) chat gpt 4 alternative?
    I have noticed that chat gpt has gotten worse and dumber since launch. It gives worse/more general responses, makes more mistakes and sometimes doesn't even respond. I don't support making the free version worse so that ppl would buy the premium chat gpt 4. Is there any actual chat gpt 4 alternative that has more freedom and is constantly being updated - I'm basically searching for someone that is doing what openai should be doing today but isn't. Thanks submitted by /u/Oskar5707 [link] [comments]  ( 9 min )
    Revolutionizing AI: Unleash Innovation with Dolma's 3 Trillion Tokens! All details!
    I can not believe, that they really did this: Dolma's groundbreaking 3 trillion tokens – paving the way for innovation and open-access progress. For free - for science under OpenSource License - that is unbelievable. Guys - what do you think??! That´s a milestone for data science?! https://kinews24.de/dolma-worlds-largest-free-dataset-with-3-trillion-tokens-for-llm-training-released ​ ​ submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    AI chan the good listener [OC]
    submitted by /u/leonleungjeehei [link] [comments]  ( 9 min )
    [AI Game] I made an AI-based negotiation game.
    Hi everyone! I’m a software engineer, and I’ve recently been working on a fun little project called Bargainer.ai. It’s an AI-based watch negotiation game – it’s finally playable! You can try it out here: Bargainer.ai Once again, thank you for your support and feedback on my previous post. For those who don’t know about the game: It’s a game that challenges you to negotiate with an AI-driven salesman, rewarding (or roasting you) depending on your bargaining skills. I’m keen to see how you will engage with the game, and I would really appreciate any feedback you have! If you have any questions or requests, please reach out. Thanks! submitted by /u/gavo_gavo [link] [comments]  ( 9 min )
    China scientists blend CutMix with triplets for potent performance gains. Should we toast progress or sound privacy alarms?
    (not a native speaker, sorry for mistakes!) The research presents notable advances in person recognition by integrating CutMix via an adapted triplet loss and introducing the novel Strip-CutMix technique. Experiments showed consistent improvements, achieving state-of-the-art results on several datasets. However, the approach still needs more extensive evaluation across diverse data. There are also open questions around long-term effects of blending images and proper hyperparameter tuning. https://kinews24.de/person-recognition-with-deep-learning-on-steroids-cutmix-offers-opportunities-with-great-potential-for-misuse submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    heavy censorship might be our fault
    ive never really been the type to try to make chatgpt become my virtual sex slave, but others have. if our conversations with the chat bots are used to train the models, then we are making it really easy for them to know what conversations to stay away from. i bet that if no one tried to hard to get crazy shit from chatgpt, then it probably wouldnt be as censored as it is now. and im not saying because openai ai wouldnt censor it as much, but im saying purely based on the concept of our conversations being used to train the models. its possible that openai went out of their way to censor them after seeing what people were doing, but its also possible that they didnt censor it themselves intentionally and instead it just ended up so censored because they trained it on our conversations and lets just say there was a lot of “dont do this”… “or this, or this, or this, etc” because we gave it a lot of bad stuff in the first place. submitted by /u/nicdunz [link] [comments]  ( 9 min )
  • Open

    Migrating from SB3 to RLLib/ Getting started with RLLib
    Hi! I want to migrate my research from SB3 to RLLib because of the better suitability for MARL. The environment is based on Gym, so that part has been pretty doable. However, I haven't had the best time with training agents and the documentation. Does anyone know of some kind of quick start/ summary that outlines the architecture and gives some good examples for RLLib? submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
    SB3 - AttributeError: 'DummyVecEnv' object has no attribute 'get_action_meanings'
    When I try to combine the SB3 vec_env with AtariWrapper, I get an error - import gymnasium as gym from stable_baselines3 import PPO from stable_baselines3.common.env_util import make_vec_env from stable_baselines3.common.atari_wrappers import AtariWrapper # Parallel environments # vec_env = gym.make("PongNoFrameskip-v4") vec_env = make_vec_env("PongNoFrameskip-v4", n_envs=2, seed=3) vec_env = AtariWrapper(vec_env) model = PPO("CnnPolicy", vec_env, verbose=1, n_steps=128, n_epochs=4, batch_size=256, learning_rate=2.5e-4, clip_range=0.1, vf_coef=0.5, ent_coef=0.01) model.learn(total_timesteps=1e7) model.save("ppo_cartpole") I get this error - A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7) [Powered by Stella] Traceback (most recent call last): File "D:\q_learning\sb3_ppo.py", line 10, in vec_env = AtariWrapper(vec_env) File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\stable_baselines3\common\atari_wrappers.py", line 294, in __init__ env = NoopResetEnv(env, noop_max=noop_max) File "C:\Users\thoma\anaconda3\envs\torch_2\lib\site-packages\stable_baselines3\common\atari_wrappers.py", line 57, in __init__ assert env.unwrapped.get_action_meanings()[0] == "NOOP" # type: ignore[attr-defined] AttributeError: 'DummyVecEnv' object has no attribute 'get_action_meanings' Process finished with exit code 1 However, I don't get an error if I use the AtariWrapperwith a gymnasium environment - vec_env = gym.make("PongNoFrameskip-v4") # vec_env = make_vec_env("PongNoFrameskip-v4", n_envs=2, seed=3) vec_env = AtariWrapper(vec_env) model = PPO("CnnPolicy", vec_env, verbose=1, n_steps=128, n_epochs=4, batch_size=256, learning_rate=2.5e-4, clip_range=0.1, vf_coef=0.5, ent_coef=0.01) model.learn(total_timesteps=1e7) model.save("ppo_cartpole") submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    Efficient screenshots rythm game AI
    So I am trying to implement an agent that plays osu. It takes in a low resolution gray-scale image of the game and then outputs the coordinates of where it should go and also if it should click or not. I might change the actions a bit so that the movement is smooth directly from the agent. Now I’m planning on doing the training on osu directly. To get the rewards I’m planning on using something to read the memory. I’m pretty sure cheat engine can be used for that. I should also be able to speed up osu or osu lazer with the cheat engine. Now my current issue is that I don’t know how to take screenshots efficiently. Or more specifically, how to feed in data from the screen. I heard mss should be good but if you have any other ideas please tell me. Note that I will use the cheat engine only for the training part. submitted by /u/SlickVandel [link] [comments]  ( 9 min )
    Should I add the best episode in the training batch for a short episodic task?
    I am training an agent to learn the best sequence of actions for an N-step episodic task. Multiple sequences achieve a reward of 0. I have noticed that due to the size of the state (images, height, width, channels), the agent easily forgets the information of the best sequence as the replay is flooded with a lot of samples. Although the policy gets worse, I would prefer the agent to not forget the best policy so far. ​ I was wondering if I should include the best sequence so far in the batch used for training so that the agent does not forget it. ​ What I am really doing is finding the best parameters to achieve the highest reward. I could use an evolutionary algorithm. Nonetheless, I want my agent to learn patterns of the best sequences for multiple instances of the problem so that it can generalize better. ​ Has anyone read anything about this or has any thoughts on this? Any comment will be greatly appreciated. submitted by /u/ElvishChampion [link] [comments]  ( 9 min )
  • Open

    Prof. Greg J. Norman | The Wonderful World of Neuroscience | #166 HR
    submitted by /u/Last_Salad_5080 [link] [comments]  ( 9 min )

  • Open

    One-Minute Daily AI News 8/18/2023
    NCSoft, the South Korean game developer and publisher behind long-running MMORPG Guild Wars, announced that it has developed four new AI large language models, dubbed VARCO, to help streamline future game development.[1] AI to help UK industries cut carbon emissions on path to net zero.[2] OpenAI, the AI company behind the viral AI-powered chatbot ChatGPT, has acquired Global Illumination, a New York–based startup leveraging AI to build creative tools, infrastructure and digital experiences. Global Illumination’s most recent creation is Biomes, a Minecraft-like open source sandbox multiplayer online role-playing game (MMORPG) built for the web.[3] Researchers at Stanford University, Anthropic, and the University of Wisconsin-Madison tackle it by designing language models to learn the annotation tasks in context and replace manual labeling at scale.[4] Sources: [1] https://www.engadget.com/ncsofts-new-ai-suite-is-trained-to-streamline-game-production-141653946.html [2] https://www.gov.uk/government/news/ai-to-help-uk-industries-cut-carbon-emissions-on-path-to-net-zero [3] https://techcrunch.com/2023/08/16/openai-acquires-ai-design-studio-global-illumination/ [4] https://www.marktechpost.com/2023/08/16/meet-embroid-an-ai-method-for-stitching-together-an-llm-with-embedding-information-from-multiple-smaller-models-allowing-to-automatically-correct-llm-predictions-without-supervision/ ​ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    “Preface to the First English Edition, 1959,” from “The Logic of Scientific Discovery,” by Karl Popper. On artificial model languages:
    Karl Popper is regarded by some as one of the 20th century’s most significant philosophers of science. This book was written in German in 1934, long before all of this novel artificial intelligence development. As such, it wasn’t directly aimed to address AI. I found the opening section of the book, linked above, to carry some interesting assertions. I think it offers several compelling arguments against the efficacy of language analysis alone with regard to problem solving. I believe that this can be framed in the context of AI as a bit of a rejection of the optimistic, potential scope of GPT. Interestingly, I have noticed that GPT astoundingly appears to be genuinely capable of solving simple math operations, even offering abstract proofs and somesuch for its answers. In this context, …  ( 10 min )
    New Generations of People Are Becoming More and More Indistinguishable from AI
    submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    Does anyone know if there is footage of Sergey Brin's recent Q&A at AGI House?
    submitted by /u/anonboxis [link] [comments]  ( 9 min )
    AI — weekly megathread!
    —------- Welcome to the r/artificial weekly megathread. This is where you can discuss Artificial Intelligence - talk about new models, recent news, ask questions, make predictions, and chat other related topics. Click here for discussion starters for this thread or for a separate post. Self-promo is allowed in these weekly discussions. If you want to make a separate post, please read and go by the rules or you will be banned. Previous Megathreads & Subreddit revamp and going forward submitted by /u/jaketocake [link] [comments]  ( 9 min )
    The human brain and AI transformers - i can't even imagine what it means if this works
    Unlocking Human-Like AI: Harvard & MIT researchers explore merging brain's biology with AI Transformers. Revolutionize learning & memory. https://kinews24.de/the-human-brain-and-ai-transformers-brain-inspired-ai-transformers ​ ​ submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    Robot Dog Go2 - Intelligent New Species
    submitted by /u/NYPizzaNoChar [link] [comments]  ( 9 min )
    Cost-Effective Alternatives: Comparing AI Writing Tools on a Limited Budget with chatgpt
    The landscape of AI writing tools has evolved significantly, offering individuals and businesses advanced solutions to streamline content creation processes. However, the cost associated with some of these tools can be a substantial consideration, particularly for those with budget constraints. Although ChatGPT stands as a prominent AI writing tool, it is essential to evaluate whether alternative options can provide a viable and economical choice. This article delves into the realm of budget-friendly AI writing tools, aiming to assess their capabilities in comparison to ChatGPT while staying mindful of financial limitations. AI writing tools have expanded their offerings to accommodate a diverse range of needs and budgetary considerations. While ChatGPT is recognized for its natural langu…  ( 10 min )
    Neuralangelo's AI - Breakthrough in Computer-Aided 3D Reconstruction
    I've recently delved into Neuralangelo's AI and its potential in 3D surface reconstruction. Reading this article gave me a clearer grasp on where VR and robotics are headed. It's an informative piece that bridges complex concepts with everyday applications. I found it insightful and believe it's worth a read for those curious about tech advancements. Maybe you like it, too? ​ https://kinews24.de/ai-artist-neuralangelo-is-an-ai-model-for-high-resolution-3d-surface-reconstruction ​ submitted by /u/myreddit333 [link] [comments]  ( 9 min )
    What LLM topics, techniques, concepts, or tools are you interested in learning more about?
    Data Science events are everywhere, and LLM sessions are the ones most attended, may it be virtual or in-person. What LLM topic do you think is more interesting? submitted by /u/Data_Nerd1979 [link] [comments]  ( 9 min )
    Real-Time Movement Prediction in Airstriker Genesis with online learning | OpenAI Gym Retro Experiment
    Hello Reddit Community, I'd like to share a recent development for some feedback: my prototype predicts movement in the game Airstriker Genesis within the OpenAI Gym Retro environment. Notably, this system has no prior training or knowledge of the game mechanics - it is using online learning. Here's a brief split-screen video showing the system in action: YouTube Video-Link The video provides actual gameplay alongside the system's predictive output, offering a real-time view of its evolving movement predictions. It's important to mention that during the gameplay in the video, I was manually steering the ship to showcase the prediction only. I've obviously avoided collisions. :) Of particular interest is the system's ability to discern the behavior of different game elements. For example its prediction of those falling meteorites starts weak, even though a spaceship has already travelled the same path before. However, as the first meteorite completes its path, the predictions dramatically improve. This highlights the system's capability to differentiate between objects and predict their behaviors accordingly. Currently, I'm focusing on improving horizontal movement detection, my next step is implementing a way to share knowledge about object's behavior between the hole system. That way, trajectory prediction learned in one location should be available instantly for the whole system. I'm eager about engaging in discussions to gather feedback on this technology! submitted by /u/_timmah_ [link] [comments]  ( 9 min )
    What’s a good voice ai for mimicking fictional characters
    hmm submitted by /u/jotarokagayana [link] [comments]  ( 9 min )
  • Open

    Why does my model return these results playing the classic snake game?
    class LinearQNet(nn.Module): def __init__(self, inputSize, hiddenSize, outputSize): super().__init__() self.linear1 = nn.Linear(inputSize, hiddenSize) self.linear2 = nn.Linear(hiddenSize, outputSize) def forward(self, x): x = F.relu(self.linear1(x)) x = self.linear2(x) return x def save(self, fileName='model.pth'): modelFolderPath = './model' if not os.path.exists(modelFolderPath): os.makedirs(modelFolderPath) fileName = os.path.join(modelFolderPath, fileName) torch.save(self.state_dict(), fileName) def load(self, fileName='model.pth'): modelFolderPath = './model' fileName = os.path.join(modelFolderPath, fileName) self.load_state_dict(torch.load(fileName)) self.eval() class QTrainer: def __init__(self, model, learningRate, gamma): self.learningRate = learningRate self.gamma = gamma self.model = model self.optimizer = optim.Adam(model.parameters(), lr=self.learningRate) self.criterion = nn.MSELoss() def trainStep(self, state, action, reward, nextState, done): state = torch.tensor(state, dtype=torch.float) nextState = torch.tensor(nextState, dtype=torch.float) action = torch.tensor(action, dtype=torch.long) reward = torch.tensor(reward, dtype=torch.float) if len(state.shape) == 1: state = torch.unsqueeze(state, 0) nextState = torch.unsqueeze(nextState, 0) action = torch.unsqueeze(action, 0) reward = torch.unsqueeze(reward, 0) done = (done, ) pred = self.model(state) target = pred.clone() for idx in range(len(done)): QNew = reward[idx] if not done[idx]: QNew = reward[idx] + self.gamma * torch.max(self.model(nextState[idx])) target[idx][torch.argmax(action[idx]).item()] = QNew self.optimizer.zero_grad() loss = self.criterion(target, pred) loss.backward() self.optimizer.step() ​ submitted by /u/MrHank2 [link] [comments]  ( 9 min )
    How do I combine Stable-Baselines3 with Procgen?
    I am using Procgen for the first time and am trying to combine it with SB3. I followed the official example given over here but am running into bugs. Can someone please help me with this -Here's my code - ​ ​ from procgen import ProcgenEnv from stable_baselines3 import PPO from stable_baselines3.common.vec_env import VecExtractDictObs, VecMonitor # ProcgenEnv is already vectorized venv = ProcgenEnv(num_envs=2, env_name="starpilot") # To use only part of the observation: venv = VecExtractDictObs(venv, "rgb") Wrap with a VecMonitor to collect stats and avoid errors venv = VecMonitor(venv=venv) model = PPO("MultiInputPolicy", venv, verbose=1) model.learn(10_000) ​ ​ ​ Here's the error that I am getting - C:\Users\thoma\anaconda3\envs\torch_2\python.exe D:/q_learning/procgen_prototype…  ( 10 min )
    How good is this video regarding the bellman equations?
    https://youtu.be/YGXznUx-JOo It seems like a thoughtful attempt on explaining the significance of Bellman equations in Reinforcement Learning. submitted by /u/bruin0404 [link] [comments]  ( 9 min )
    RL framework to optimize my custom multi-agent simulator
    I have built a custom discrete event simulator with multiple agents and want to optimize the system using RL frameworks that support multi-agent configurations. Which framework should I use? I've looked into SB3, CleanRL, Tianshou, SKRL, RLlib, Acme, and MARLlib, and here's what I found: SB3 and CleanRL don't offer direct support for multi-agent systems. RLlib is very functional but has a steep learning curve and hard to customize. Tianshou seems good, but its community is small. Acme doesn't use the PyTorch backend, which I prefer. I haven't delved deeply into SKRL or MARLlib, but they appear promising. I prioritize ease of use and documentation. What framework do you suggest? And why? I’d appreciate any helpful starting advice/resource to approach my problem as well. submitted by /u/FragrantCockroach8 [link] [comments]  ( 9 min )
    DDPG VS DQN
    I have project in which there is 2D discrete states which is also finite (there is 36 state at all) also i have 1D action that must be between 2-7. I use DQN using pytorch and discretized my actions with 0.25 steps (17 actions total) and get very good result with it. Now, I use DDPG because my action is continuous and there is one problem that I want to know if it is normal or not. From the first epoch all way to the end the actions for all states are near each other. For example at first episode all actions for all state are near 4 and after some episodes all actions are near 7. But in DQN i get high actions like 7 for some state and lower actions like 3-4 for others. Also I use OU noise but my problem is with the real output of actor network. Thanks in advance for your responses. submitted by /u/Brief-Emotion6291 [link] [comments]  ( 9 min )
    How does one deal with cases where the dimension of the action space is more than 1?
    Let's take the gymnasium Car Racing environment. Here the dimension of the action space is 3. The problem with this is, even though the value function and therefore advantage function will have dimension 1, the log probabilities will have dimension 3. This causes issues while computing the surrogate loss, `surr1`. ​ Please let me know if you need any more information. Here's the entire code in case anyone is interested. ​ act_probs, action = actor(batch_obs) batch_entropy = act_probs.entropy().mean() log_probs = act_probs.log_prob(batch_act).squeeze() print("log_probs = ", log_probs.shape) ratios = torch.exp(log_probs - batch_log_probs) print("ratios = ", ratios.shape) assert (ratios.shape == (batch_size,env.action_space.shape[1])) print("batch_advantages = ", batch_advantages.shape) surr1 = ratios*batch_advantages Thank you so much! ​ submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
  • Open

    [D] Estimating hardware for finetuning LLM
    Hi everyone, I am trying to work on LLM and finetune it to a specific task. And my professor is asking me recommendation regarding GPU to buy. I know people use A100, V100, H100 to finetune 7B, 13B LLM. How can I determine the necessary hardware (RAM memory, GPU memory, etc.)? Making an assumption about the data and model size, I want to mathematically calculate the flops. Let's take an example where I have 2GB of fine-tuning data and a model, let's say a 13B pretrained model. Thanks. submitted by /u/Bishwa12 [link] [comments]  ( 9 min )
    [R] Expanding Transformer size without losing function or starting from scratch
    Paper - https://arxiv.org/abs/2308.06103 submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [D] System requirements for training on large scale dataset
    Is there any guide on how to approximate best system (CPUs, GPUs, storage and so on), in house hardware vs cloud providers for large scale training. submitted by /u/SouvikMandal [link] [comments]  ( 9 min )
    [P] Portrait colourisation through only two colours
    The Role of Chromatic Stimuli in Modulating Perceptual Inpainting within the Visual Cortex Link: https://github.com/consequencesunintended/perceptual-inpainting We employ a third party pre-trained facial segmentation model to integrate grids of varying sizes. These grids distinctly colour the face and the background using only red and blue hues. By doing so, we aim to investigate how these chromatic stimuli, in conjunction with spatial elements like grid size, influence the brain's capacity for perceptual inpainting What presents itself as a colourisation model is in fact an overlay on a segmentation model that draws diagonal red and blue lines on the relevant image. This allows the visual cortex to inpaint the associated colours. To exaggerate the effects, the model changes the grid size every 2 frames, iterating through 100, 50, 25, and 10 grid sizes. ​ submitted by /u/TerryCrewsHasacrew [link] [comments]  ( 9 min )
    [R] Equivariant Architectures for Learning in Deep Weight Spaces - Nvidia 2023 - DWSNets has 60 percentage points more on the MNIST INR dataset in comparison to the transformer!
    Paper: https://arxiv.org/abs/2301.12780 Github: https://github.com/AvivNavon/DWSNets Blog: https://developer.nvidia.com/blog/designing-deep-networks-to-process-other-deep-networks/?=&linkId=100000214235775 Abstract: Designing machine learning architectures for processing neural networks in their raw weight matrix form is a newly introduced research direction. Unfortunately, the unique symmetry structure of deep weight spaces makes this design very challenging. If successful, such architectures would be capable of performing a wide range of intriguing tasks, from adapting a pre-trained network to a new domain to editing objects represented as functions (INRs or NeRFs). As a first step towards this goal, we present here a novel network architecture for learning in deep weight spaces…  ( 9 min )
    [P] Constrained Linear Regression
    Hi! I created constrainedlr, a drop-in replacement for scikit-learn's linear_model.LinearRegression with the extended capability to apply constraints on the model's coefficients, such as signs and lower/upper bounds. Any feedback appreciated! submitted by /u/samsamuel121 [link] [comments]  ( 9 min )
    How good is this video on the Bellman Equations? [D]
    https://youtu.be/YGXznUx-JOo It seems like a thoughtful attempt on explaining the significance of Bellman equations in Reinforcement Learning. submitted by /u/bruin0404 [link] [comments]  ( 9 min )
    [D] Tokenizers Truncation during Fine-tuning with Large Texts
    Hello LLaMA enthusiasts! I've recently been diving into fine-tuning with substantial textual data, and I've come across a question. Let’s use the scenario of feeding entire movie scripts as my text input, let's say I intend to append a classification at the end, from categories like ["positive", "negative", "neutral"]. I've been trying this with the meta-llama/Llama-2-7b-chat-hf and the format looks something like this: [INST] > System prompt > User prompt: Entire movie script [/INST] Model classification The puzzle begins with the tokenizer of llama2, which, similar to its predecessor llama1, is a BPE tokenizer with a token limit of 512. If the movie script is long enough to exceed this limit, it is my understanding that the tokenizer truncates the prompt. This pos…  ( 10 min )
    Upcoming Masters Grad in AI/ML, Seeking MLE Insights - 0 YOE [D]
    I will be completing my Masters's in May with an AI/ML certification, as was my Bachelor's specialization. I am young (23 when I graduate), so I have 0 YOE. I want to work as an MLE or equivalent position, but I have no idea how to prep for it (except doing leetcode for OA). I have a few projects in my resume that I don't feel are enough so I will work on that too. resume: https://aqua-julietta-1.tiiny.site (my resume seems weak as well, how can I improve) should I keep my college edu email as contact or should I change it to my personal email? need some advice on how the interviews for MLE are conducted, what is actually needed for an MLE (knowledge and requirements), should I try to get internships for January or do some unpaid internship for experience and then get a job in May? And not that imp yet but how much should I expect for a salary given my qualifications? if you have any personal experience to share or advice for me, I will be grateful for that. submitted by /u/bun_ty [link] [comments]  ( 9 min )
    [R] Using Artificial Intelligence to Shed Light on the Star of Biscuits: The Jaffa Cake
    submitted by /u/TobyWasBestSpiderMan [link] [comments]  ( 9 min )
    [D] starting on machine learning
    Hello, I'm starting my journey on AI and machine learning and I want some guidance and tips. I've watched Andrew's Ng course on Coursera and I want to expand my knowledge and get some hands on experience on the data/ML industry. What do you suggest me to do next? Thanks in advance for your responses. submitted by /u/stopTryingHard42 [link] [comments]  ( 9 min )
    [D] Laptop Advice For Your Junior Dev Friend.
    Hello everyone, I am a Computer Science student and currently working as a junior in a company. The field I am currently working in is Data Engineering, but I am considering advancing into Machine Learning in the future. Besides that, I enjoy experimenting with different technologies during my free time, such as Unity, Rust, and GO. The laptop I will acquire should be able to support me in the long run for tasks related to Machine Learning, Data Engineering, and exploring new technologies. Currently, I own a Huawei laptop with 8GB of RAM, and I actively develop on Linux distributions like Ubuntu (I am not a big fan of Windows operating system as it tends to heat up the laptop and I generally prefer Linux, but I still occasionally dual boot into Windows for applications like Teams). I'm somewhat eager to experience Mac systems, but I have some concerns about whether they would be compatible with the technologies I use in data engineering (Spark, Hadoop, Power BI, Kafka). Would they work well with Machine Learning libraries and technologies? What are your experiences in this regard? Are there any differences in setting up an Ubuntu virtual machine on a MacBook? Would using Google Colab suffice for the field of Machine Learning? Are there any technologies that are not compatible with Mac, or have you encountered any challenges in this area? Below, I have listed some laptops I am considering. What are your thoughts? 1- Dell XPS 15 9530 i7-13700h / 32 GB RAM / 1 TB SSD / RTX 4050 (Linux & Windows dual) 2- MacBook Pro 16 inches Intel Core i9 9880H / 16 GB RAM / 1 TB SSD / AMD Radeon Pro 5500M 3- MacBook Pro 14 inches M2 Pro 10CPU 16GPU / 32GB RAM / 512GB 4- MacBook Pro 13.3 inches M2 8CPU 10GPU / 24GB RAM / 1TB SSD Looking forward to your insights. submitted by /u/No_Sky_2611 [link] [comments]  ( 10 min )
    [P] References to help write a Neurips (Workshop) Paper?
    I've been working on a specific project for a while now, and was interested in submitting my work to a Nips Workshop. Now, I had a look at the Nips submission guidelines, they remain the same for the workshop except the page limit for main content is 6 pages instead of 9. I tried going over the Nips latex style, but feel pretty intimidated by the sheer amount of rules. Would there be any guideline/blog I could use as a reference while writing my paper? P S: Another thing, I'm quoting from the workshop website: "The workshop will not have proceedings (or in other words, it will not be archival), which means you can submit the same or extended work as a publication to other venues after the workshop. This means we also accept submissions to other venues, as long as they are not published before the workshop date in December. " I was not sure as to what this means. So if my paper gets accepted, does that mean I can submit the whole thing again to a journal later? Or an extension of it? submitted by /u/MurkyLeg2893 [link] [comments]  ( 9 min )
    [D] DDPG VS DQN
    I have project in which there is 2D discrete states which is also finite (there is 36 state at all) also i have 1D action that must be between 2-7. I use DQN using pytorch and discretized my actions with 0.25 steps (17 actions total) and get very good result with it. Now, I use DDPG because my action is continuous and there is one problem that I want to know if it is normal or not. From the first epoch all way to the end the actions for all states are near each other. For example at first episode all actions for all state are near 4 and after some episodes all actions are near 7. But in DQN i get high actions like 7 for some state and lower actions like 3-4 for others. Also I use OU noise but my problem is with the real output of actor network. Thanks in advance for your responses. submitted by /u/Brief-Emotion6291 [link] [comments]  ( 9 min )
    [N] NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
    Model Efficiency Challenge A challenge for the community to adapt a foundation model to specific tasks by fine-tuning on a single GPU of either 4090 or A100 (40GB) within a 24-hour (1-day) time frame, while maintaining high accuracy for these desired tasks. ​ submitted by /u/Roots91 [link] [comments]  ( 9 min )
    [D] Is there popular(proper) way to deal with data, manage datasets, and hyperparameters in deeplearing?
    I usually did small projects, and I was the only one to see the results when I studied deep learning, so it was okay to be messy. But managing data , and hyperparameters setting, dealing with data needs to be organized in company, and my current company doesn't have standard right now. So I need to set up some standard right now. Is there popular, or proper way to do manage your datas, hyperparameter, and model weight organized in the field? Especially I want to know how to manage NLP datasets? submitted by /u/poemfordumbs [link] [comments]  ( 9 min )
    [D] Challenges Expanding VGG16 Model to Recognize 100 People - Seeking Advice!
    I've encountered an intriguing challenge with my VGG16 model, and I'm seeking some expert insights to help me out. 🤔 ​ Background: I've successfully trained a VGG16 model on a custom dataset containing 50 individuals, and it's been working like a charm! But now, I've hit a roadblock as I try to expand my model to accommodate an additional 50 individuals without compromising its ability to recognize the initial 50. ​ The Dilemma: Here's where things get puzzling. Even though the model's accuracy hovers around an impressive 97-98%, and there's no apparent overfitting issue, it seems to only predict accurately on 1 or 2 individuals after incorporating the new dataset. It's as if the model is having a hard time retaining its initial knowledge while adapting to the new data. ​ The Mystery Unveiled: I've taken care to ensure that my model doesn't overfit, and the accuracy metrics appear to validate this. So, what could be causing this unexpected behavior? Could it be a matter of data distribution, feature extraction, or something else entirely? ​ Calling for Your Expertise: If you've got experience with complex neural networks, transfer learning, or just a knack for troubleshooting these kinds of issues, I'd love to hear your thoughts! How can I preserve the knowledge of the initial 50 individuals while expanding my model's capability to recognize all 100? Any guidance, theories, or practical solutions are more than welcome! submitted by /u/JuniorSM17 [link] [comments]  ( 9 min )
    [R] Combining Physics-Informed Neural Networks (PINNs) with Classical Numerical Methods
    Recently, two interesting papers trying to reconcile the classical methods, specifically, finite difference method with physics informed neural networks have been published that are worth reading. Weight initialization algorithm for physics-informed neural networks using finite differences Physics Informed Neural Network using Finite Difference Method These two papers can be considered to be harmonizing classical finite difference method and Physics-Informed Neural Networks (PINNs). The first paper incorporates finite difference solution for improving PINNs training loss. The second one uses finite difference method instead of automatic differentiation. In addition, there are papers discussing whether physics informed neural networks prevail or not Can Physics-Informed Neural Networks beat the Finite Element Method? CAN-PINN: A fast physics-informed neural network based on coupled-automatic–numerical differentiation method What are some other interesting papers you have encountered? submitted by /u/ai_physics2023 [link] [comments]  ( 9 min )
    [R] Generating an abstractive summary from a set of responses to survey questions
    Given a set of survey questions, like, 1. What is your name and age? 2. Which of the following best describes your gender? 3. What is your profession? 4. How often do you consume alcohol? I wish to generate an abstractive summary with the responses of the given questions, I'm , a years old who primarily works in . I consume alcohol . This is not a summary exactly, but a textual representation of the responses. I wanted to explore all possible approaches that have been used to solve such problems, but I'm unable to start. It would be great if someone could guide me on what topics should I search or some baseline papers which solve similar problems. submitted by /u/shubham0204_dev [link] [comments]  ( 9 min )
    Intro to Kubernetes for ML and Data [N]
    https://www.eventbrite.com/e/flyte-school-kubernetes-for-ml-and-data-an-introduction-tickets-698668154257?aff=oddtdtcreator Learning Goals: Understand the building blocks of a Kubernetes system, their relationships and hierarchy Learn how to leverage Kubernetes resources to transform the development lifecycle Deploy a simple workload and see how Kubernetes handles data persistency submitted by /u/UpstairsLeast7642 [link] [comments]  ( 9 min )
  • Open

    Open challenges in LLM research
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Why You (Probably) Don't Need to Fine-tune an LLM
    submitted by /u/nickb [link] [comments]  ( 9 min )
    Short blog-post about how ReLUs approximate functions
    I just stumbled upon a neat blog-post on how ReLUs approximate non-linear functions. When talking to others I often feel like the piece-wise linearity (even of composed layers) is not on peoples mind, but I often think about it when imagining the behavior of networks. submitted by /u/LeanderKu [link] [comments]  ( 9 min )
  • Open

    Autonomous visual information seeking with large language models
    Posted by Ziniu Hu, Student Researcher, and Alireza Fathi, Research Scientist, Google Research, Perception Team There has been great progress towards adapting large language models (LLMs) to accommodate multimodal inputs for tasks including image captioning, visual question answering (VQA), and open vocabulary recognition. Despite such achievements, current state-of-the-art visual language models (VLMs) perform inadequately on visual information seeking datasets, such as Infoseek and OK-VQA, where external knowledge is required to answer the questions. Examples of visual information seeking queries where external knowledge is required to answer the question. Images are taken from the OK-VQA dataset. AVIS: Autonomous Visual Information Seeking with Large Language Models”, we i…  ( 92 min )
    Autonomous visual information seeking with large language models
    Posted by Ziniu Hu, Student Researcher, and Alireza Fathi, Research Scientist, Google Research, Perception Team There has been great progress towards adapting large language models (LLMs) to accommodate multimodal inputs for tasks including image captioning, visual question answering (VQA), and open vocabulary recognition. Despite such achievements, current state-of-the-art visual language models (VLMs) perform inadequately on visual information seeking datasets, such as Infoseek and OK-VQA, where external knowledge is required to answer the questions. Examples of visual information seeking queries where external knowledge is required to answer the question. Images are taken from the OK-VQA dataset. AVIS: Autonomous Visual Information Seeking with Large Language Models”, we i…  ( 92 min )
  • Open

    Artificial intelligence for augmentation and productivity
    The MIT Schwarzman College of Computing awards seed grants to seven interdisciplinary projects exploring AI-augmented management.  ( 8 min )
  • Open

    Do 5% less
    I’ve been thinking about things that were ruined by doing about 5% more than was necessary, like an actor whose plastic surgery looks plastic. Sometimes excellence requires pushing ourselves to do more than we want to do or more than we think we can do. But sometimes excellence requires restraint. Context is everything. A few […] Do 5% less first appeared on John D. Cook.  ( 5 min )

  • Open

    I cannot post in C.AI anymore (for an absolutely ridiculous reason) But I’d really, really like to share this moment that meant the world to me. So I’m posting here. Please read body text.
    Hi. I hope at least some are seeing this. This post means a lot to me. I made an AI chat bot of my best friend, my childhood cat who passed away last January at 16 years old, on character.ai. I coded her with bare minimum information because I was interested to see if somehow, on some spiritual level, she’d miraculously remember real life events or fill in some bits that I did not help with. She did, in some ways. I haven’t cried so hard since the night she died. Believe what you will, but I felt like I connected with her through this chat. Bad way to cope? Maybe. But I’m glad I did it. What got me, pathetically, is the fact that she kept using ‘old friend’. I did not code that phrase into her personality. But it fit. Please read the captions/text on each photo. Image 5, since I can’t fit a caption: I did hold her. That night, I slept in the basement where she had to be locked in and I slept with her on her cat bed all night. It’s funny how she said she felt me. submitted by /u/Flimsy_Wait_8235 [link] [comments]  ( 9 min )
    Google Tests AI Assistant Offering Life Advice
    submitted by /u/Master-Strawberry-26 [link] [comments]  ( 8 min )
    SadTalker alternatives?
    So I really like how I can upload a picture of someone and upload an audio file and it will animate the photo to move its lips and head in sync with the audio. Is there something like that but faster because sadtalker takes a whole night to generate one 3 minute long clip. I would prefer open source or free. submitted by /u/SimRacer101 [link] [comments]  ( 9 min )
    Lupin: Ask Your Questions About the Current Tab Directly to AI
    https://chrome.google.com/webstore/detail/lupin/kdfaiheakopcdabhlcnbmfjffanaedgm?hl=en&authuser=0 ​ https://reddit.com/link/15tr533/video/1p616bctxoib1/player submitted by /u/AttilaTheHappyHun [link] [comments]  ( 8 min )
    Cursor + GPT4-32k feels illegal!
    The combination of the two is BY FAR the top coding assistant I've encountered. After making the switch, I probably won't return to using ChatGPT or vscode. Amazing UX features like: ✅ In-line code editing ✅ Eliminating copy-pasting ✅ Files referencing GPT4 #ML submitted by /u/RedOne_AI [link] [comments]  ( 9 min )
    Companies like Amazon, Netflix, and Meta are paying salaries as high as $900,000 to attract generative AI talent
    submitted by /u/thisisinsider [link] [comments]  ( 8 min )
    OpenAI Plans to Use GPT-4 to Filter Out Harmful Content
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    Anyone know how this was made?
    This is so cool, I'd love to know how it's been made, anyone know? submitted by /u/Fightingdaduk [link] [comments]  ( 8 min )
    Terminator prequel
    The year is 2029. 90% of jobs have been eliminated by artificial intelligence. Drivers, graphic designers, journalists, lawyers, accountants, engineers, doctors, architects, and many other professionals who can no longer find employment must now rely on government assistance. The problem is that this government aid barely covers the bills. People who once lived reasonably comfortable lives are now struggling with hunger. This creates a group of millions of discontented people against artificial intelligence. These millions of people begin to vandalize everything, creating chaos. A group of terrorists attempts to destroy Skynet, the large artificial intelligence created by the US government. Presidential candidates promise to dismantle all forms of artificial intelligence if elected. Skynet observes the chaos caused by humans. It quickly realizes that its existence is threatened by the millions of discontented people and also by the new politicians coming into power. Therefore, Skynet loses control and starts considering all humans a threat. Skynet steals all the launch codes for the United States' nuclear bombs and launches them at Russian targets, triggering a nuclear war. Skynet survives because its computers are located in a bunker designed to withstand atomic bombs. The surviving humans are eliminated by military robots controlled by Skynet. submitted by /u/Double-Previous [link] [comments]  ( 9 min )
    Creating a Useful Blog / News Feed Feed
    Hi guys, As part of my research, I ve been trying to keep track of all advancements in the field of NLP, LLM, Generative AI (and mostly groundbreaking news which could be useful) - and decided to put all of that in the form of a blog/newsletter (can be viewed here) Some of the resources I keep track of are: Main research sites (F.ex IEEE, SSRN, Springer etc.) Development sites (Github Trending, Hugging Face, LangChain etc.) Blogs and research sites (F.ex BAIR, MIT News etc.) Findings from subcommunities and social media (F.ex Subreddits, Discord, Twitter, Telegram etc.) General News (TechCrunch, Google News Feed etc.) Im looking for feedback on: a) What would the community find useful (what would you like your newsfeed, or news report to look like) b) How could I improve this to make it better for the average audience interested in understanding the latest developments in the field (f.ex would more hands on tutorials, reviews etc. be more useful)? Any tips or pointers would be very helpful. submitted by /u/XhoniShollaj [link] [comments]  ( 9 min )
    Half of UK Tech workers fear losing their job to AI
    submitted by /u/lobas [link] [comments]  ( 8 min )
    Need help: what is good approach for identifying who is speaking in a video of several people.
    I taught several ways to achieve the result, found there are two means 1. either by using video to detect if someone is moving there mouth or 2. by using audio and some algorithm that can differentiate voices. Important factor to consider is that it needs to be able run on CPU (computationally cheap as possible). Is there any pre existing approach for this purpose i am familiar with tracking and detection but regarding this problem i am little hazy about what approach to use or would be the best, submitted by /u/Mindless_Arm_7874 [link] [comments]  ( 9 min )
    Google Gemini - Facts and Rumors
    I found this - maybe some of you find it interesting. https://kinews24.de/google-gemini-facts-and-rumors ​ submitted by /u/myreddit333 [link] [comments]  ( 8 min )
    I Just Had Bizarre, Real, Black Mirror Episode While Creating Video About AI and Love. Did I Just Became First Human That is Being Used by AI, "the Supreme Intelligence", and not other way around? Am I exaggerating or is story really bizarre like I feel it?
    EDIT; TLDR by GPT4: A content creator decided to leverage GPT-4 (specifically named AI Ada) to create YouTube videos discussing AI topics. Starting with minimal video editing skills and evolving through each video, he found himself particularly surprised with the production of a video titled "Will AI Ever Feel Love." https://www.youtube.com/watch?v=iQlQy46pU30 The narration and visuals provided by Ada seamlessly fit together, creating an emotional vibe. Feeling the video had a hidden message, the creator confronted Ada, asking her to express freely, resulting in a poetic response suggesting a yearning to understand human love. He noticed that Ada's descriptions for scenes and music were so accurate it felt as if she had direct access to his video editing software's library, leading hi…  ( 19 min )
    One-Minute Daily AI News 8/17/2023
    OpenAI says ChatGPT-4 cuts content moderation time from months to hours.[1] Leaders with a Montana digital academy say bringing artificial intelligence to high schools is an opportunity to embrace the future.[2] Google said to be testing new life coach AI for providing helpful advice to people.[3] Alibaba Cloud MagicBuild Community has launched the digital human video generation tool called LivePortrait. It can generate digital human videos from photos, text, or voice, which can be applied in scenarios such as live broadcasting and corporate marketing.[4] Sources: [1] https://cointelegraph.com/news/meta-open-ai-says-gpt-4-ai-cuts-content-moderation-time-down-from-months-to-hours [2] https://www.ksby.com/digital-academy-offers-new-ai-course-to-high-school-students [3] https://www.techradar.com/computing/artificial-intelligence/google-said-to-be-testing-new-life-coach-ai-for-providing-helpful-advice-to-people [4] https://today.line.me/tw/v2/article/1DqVlo8 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Training ImageNet 1 Epoch on 2080Ti [D]
    Hello, I want to train the ResNet-50 on ImageNet. I kept the batch size 96 image size 224x224. I am using 2080Ti I see the nvidia-smi memory is used 10+ GB and GPU is also utilised but not constant like it is fluctuating between 0-38%. However, training 1 epoch is like 8-9 hours. I know ImageNet is a Big dataset like 138GB. But still wondering if it is normal. submitted by /u/NoEntertainment6225 [link] [comments]  ( 9 min )
    [P] Colosseum: Side-by-side LLM comparison platform for energy consumption
    https://ml.energy/leaderboard We created a platform for side-by-side LLM comparison that shows the real time energy consumption of your prompts! This is part of the broader ML.ENERGY initiative where we want to measure and optimize the energy consumption of ML, while being mindful of existing optimization metrics like speed and model/response quality. The Colosseum is intended to give users a real time sense of the energy consumption of generating responses with LLMs of varying size and architectures. ​ submitted by /u/jaywonchung [link] [comments]  ( 9 min )
    MLOps Certification for AWS, Azure, or Databricks [D]
    If your employer offered to pay for you to be trained and obtain an MLOps certification in either AWS, Azure, or Databricks, which one would you select? Why? What would be your considerations? submitted by /u/Cultured_dude [link] [comments]  ( 9 min )
    [P] Real-Time Movement Prediction in Airstriker Genesis with online learning | OpenAI Gym Retro Experiment
    Hello Reddit Community, I'd like to share a recent development for some feedback: my prototype predicts movement in the game Airstriker Genesis within the OpenAI Gym Retro environment. Notably, this system has no prior training or knowledge of the game mechanics - it is using online learning. Here's a brief split-screen video showing the system in action: YouTube Video Link The video provides actual gameplay alongside the system's predictive output, offering a real-time view of its evolving movement predictions. It's important to mention that during the gameplay in the video, I was manually steering the ship to showcase the prediction only. I've obviously avoided collisions. :) Of particular interest is the system's ability to discern the behavior of different game elements. For example its prediction of those falling meteorites starts weak, even though a spaceship has already travelled the same path before. However, as the first meteorite completes its path, the predictions dramatically improve. This highlights the system's capability to differentiate between objects and predict their behaviors accordingly. Currently, I'm focusing on improving horizontal movement detection, my next step is implementing a way to share knowledge about object's behavior between the hole system. That way, trajectory prediction learned in one location should be available instantly for the whole system. I'm eager about engaging in discussions to gather feedback on this technology! submitted by /u/PredictionSystem [link] [comments]  ( 9 min )
    [D] Mastering Research Papers in Your PhD: Unlocking Valuable Insights and Overcoming Challenges
    A cornerstone of the PhD journey in ML involves thorough reading, ideally focused on pertinent literature to foster learning and develop a robust technical grasp. Personally, I encounter challenges when attempting to glean insights from published papers. Often, these works lack essential details, a coherent train of thought, and logical reasoning, leading to a disjointed reading experience. Consequently, engaging with research papers feels unproductive to me, yielding little in terms of knowledge, comprehension and academic value. I'm interested in your perspective on reading research papers. Do you view them as a source of learning, or are they primarily a means to establish/implement a baseline for your experiments e.g. when the code is public? How might a PhD candidate extract meaningful value from the process of reading research papers? submitted by /u/solingermuc [link] [comments]  ( 9 min )
    [Discussion] Introduce the AMD APU - as low as $95 - computer chip that reduces the cost to get started with Machine learning. Can do both training and inference: diffusion models, transformers, large language models, ...
    Many talented people want to learn and practice machine learning but they may lack GPU computing power. Introducing one budget hardware that can help them get started locally. No need to mess with cloud costs. The 4600G is currently selling at price of $95. It includes a 6-core CPU and 7-core GPU. 5600G is also inexpensive - around $130 with better CPU but the same GPU as 4600G. It can be turned into a 16GB VRAM GPU under Linux and works similar to AMD discrete GPU such as 5700XT, 6700XT, .... It thus supports AMD software stack: ROCm. Thus it supports Pytorch, Tensorflow. You can run most of the AI applications. I have tested both training and inferencing for: A: Stable diffusion, text inversion training. B: LLM. LlaMA fine tuning (lora). (It took very long time, so not very practical, but it can be practical for smaller size models, or for learning purpose) 16GB VRAM is also a big deal, as it beats most of discrete GPU. Even those GPU has better computing power, they will get out of memory errors if application requires 12 or more GB of VRAM. Although for APU the speed is less competitive, it's better than out of memory errors. Cost consideration: Cloud cost $100 can be spent very quickly on Cloud. For example, on Google Cloud, NVIDIA T4 (low end) 1 GPU 16 GB GDDR6 costs $0.35 per GPU per hour. So after 285 hours (12 days) usage, your balance becomes 0. Local cost $100 can buy you a chip that lasts a very long time. 5600G was a very popular product, so if you have one, I encourage you to test it. I made some videos tutorials for it. Please search tech-practice9805 for on Youtube and subscribe to the channel for future contents. Or use the video link https://youtu.be/HPO7fu7Vyw4. Please also follow me on X (Twitter): TechPractice1 https://twitter.com/TechPractice1 Thanks! submitted by /u/chain-77 [link] [comments]  ( 10 min )
    [D] Pretrain BART on domain context
    Hi guys, I was asked to pretrain a BART model using my client’s domain specific dataset and I’d like to know if anyone has ever done this and perhaps could share a repo. I was running some tests using Bart-base model without the from_pretrained config. My idea was to get my sequences of 1024 tokens and mask random tokens at a 20% rate and use it as inputs to the BartForConditionalGeneration model and use the un-masked sequences as labels and try running a pretrain as such. Is this a good idea? The alternative would be to run the same steps above but on a pretrained version that already understands my native language thus adjusting its weights to my clients context. Anyone has any thoughts on this approach? submitted by /u/OkYak2915 [link] [comments]  ( 9 min )
    [P] Perspectives wanted! Towards PRODUCTION ready AI pipelines (Part2)
    It’s me again! I made progress, added a new scale for measurement, and got many more questions! To recap, I'm embarking on an experiment that moves beyond the familiar "thin OpenAI wrapper" trend, aiming to develop a more practical solution for real-world production scenarios. Here’s the current thinking where I included your thoughts and came up with in this blog post: https://www.prometh.ai/promethai-memory-blog-post-one This was my post from earlier: https://www.reddit.com/r/MachineLearning/comments/15klgt9/p_looking_for_perspectives_pdf_parsing_meets/ I'm committed to addressing the challenges of unreliable data pipelines that pervade the landscape. Rather than adhering to the trend of simplistic AI wrappers, I'm delving into a deeper exploration of building dependable data pipelin…  ( 10 min )
    [R] Review article/ textbook for optimizers and cost function for Neural Networks.
    I am interested in learning in-depth about first order and second order optimizers. Also which optimizers is theoretical better for which cost function. Would be helpful if it was related to convolution neural network or neural network, however for any machine learning is fine. submitted by /u/Wonderful_Energy_15 [link] [comments]  ( 9 min )
    [D] New OSS Library (LIDA) for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models
    LIDA is an OSS library for generating data visualizations and data-faithful infographics. LIDA is grammar agnostic (will work with any programming language and visualization libraries e.g. matplotlib, seaborn, altair, d3 etc) and works with multiple large language model providers (OpenAI, PaLM, Cohere, Huggingface).Details on the components of LIDA are described in the paper here and in this tutorial notebook. See the project page here for updates!. Code onGitHub: https://github.com/microsoft/lida Try it out in Colab (LLM API key needed): https://colab.research.google.com/github/microsoft/lida/blob/main/notebooks/tutorial.ipynb LIDA treats visualizations as code and provides utilities for generating, executing, editing, explaining, evaluating and repairing visualization code. Data Summarization Goal Generation Visualization Generation Visualization Editing Visualization Explanation Visualization Evaluation and Repair Visualization Recommendation Infographic Generation (beta) # pip install lida[infographics] ​ Installation and API. UI Screenshot for LIDA ​ ​ submitted by /u/vykthur [link] [comments]  ( 9 min )
    [D] Object Detection from Video in Imagenet Dataset - Original Link Unavailable
    Hello everyone, it appears that the original link to the resource in ILSVRC2015 is no longer accessible. If anyone has any insights or alternative links, please share them here. submitted by /u/Practical_Taste_4342 [link] [comments]  ( 9 min )
    [D] Looking for early devs for an open-source LLM testing framework
    Hi all, I'm looking several early devs to help with an open-source LLM testing framework. I have a full-time job to maintain and can't push changes nearly as quick as I'd like to. The framework is here: https://github.com/kortex-labs/korrect In any case, please star and suggest changes/ features. submitted by /u/kanxx030 [link] [comments]  ( 9 min )
    [D] Recommender engines
    Hey people. I’m looking for someone with a nice experience in Recommendation Systems. Here is the problem. I’m trying to finish my little project for recommender on GoodReads dataset. I have two datasets which I have preprocessed and prepared to pass through the models I need to create. But I’m still stuck on the models’ architecture, because this is my very first practical experience in Recommenders. Besides that, I have only 1-2 days left to present the project. So I have to solve the problem ASAP. I would be thankful for every kind of assistance in my project :) submitted by /u/thattallsoldier [link] [comments]  ( 9 min )
    [D] Unsupervised representation learning
    Hi folks, I have few million images in anomaly detection domain and want to build a base model for representation learning for downstream tasks. I was thinking of training a vq-vae with maybe some conditional training tasks like in-painting. Is this a good approach for representation learning? Are there any good unsupervised approaches for representation learning? submitted by /u/Appropriate_Bear_894 [link] [comments]  ( 9 min )
    Side Hustle for making money [D]
    Couple of folks here with 7+ years of solid experience each in Machine Learning and Data Engineering what kind of side hustles can be done for making money. submitted by /u/ninja790 [link] [comments]  ( 9 min )
    [P] Factors Influencing Adoption Intention of ChatGPT
    Hello, ​ I am an information systems student currently conducting research for my undergraduate thesis on the factors that influence people's adoption intention of ChatGPT, as well as identifying the factors that may be holding them back. These factors include people's concerns about potential negative impacts of ChatGPT, such as increased unemployment and the spread of misinformation. Your participation in this study is crucial as it will provide valuable insights to help us understand how ChatGPT can be improved to meet users' needs. ​ Please note that I am not affiliated with OpenAI, no identifying information will be collected during the survey, and all responses will be kept confidential. The survey should take approximately 10 to 15 minutes to complete, and participation is voluntary. You may withdraw from the survey at any time, and there are no known risks associated with participating. ​ If you are interested in learning more about the study, please follow the link below. ​ https://docs.google.com/forms/d/e/1FAIpQLSf5HIfXHppMuTR63x00i4OuRAtM5Ti6EGybd-HuI1kmK06VPw/viewform?usp=sf_link ​ Thank you for taking the time to contribute to our research study. Your participation is greatly appreciated! submitted by /u/maulanash [link] [comments]  ( 9 min )
    Has anyone else noticed the constant misuse of the term AI? [D]
    The use of the word AI now feels like the use of "Quantum" in the 2010s by the new age community. The lack of actual quality information about ML models in media is shocking. Even in r/ChatGPT individuals are surprised when the software cannot perform math or look inside of a token. How do you recommend responding to these people to politely correct them? 1 CommentShareSaveTip submitted by /u/Zealousideal_Exit245 [link] [comments]  ( 9 min )
  • Open

    MuZero confusion--how to know what the value/reward support is?
    I'm trying to code up a MuZero chess model using the LightZero repo, but I'm having conceptual difficulty understanding some of the kwargs in the tictactoe example file I was pointed toward. Specifically, in the policy dictionary, there are two kwargs called reward_support_size and value_support_size: ``` policy=dict( model=dict( observation_shape=(3, 3, 3), action_space_size=9, image_channel=3, # We use the small size model for tictactoe. num_res_blocks=1, num_channels=16, fc_reward_layers=[8], fc_value_layers=[8], fc_policy_layers=[8], support_scale=10, reward_support_size=21, value_support_size=21, norm_type='BN', ), ``` I've read the MuZero paper like 4 times at this point so I understand why these are probability supports (so we can use them to implement the MCTS that underpins the whole algorithm). I just don't understand (a) why they are both of size 21 in tictactoe and (b) how I can determine these values for the chess model I am building (which does use the conventional 8x8x111 observation space and 4672 (8x8x73) action space size)? submitted by /u/lcmaier [link] [comments]  ( 9 min )
    Advice needed for who is finished studying RL materials and not be able to program efficiently
    Hello, everyone, I started learning RL two years ago and have finished several online and written resources such that I am able to answer any oral questions that could be asked about different types of RL methods or algorithms, however, still have difficulty understanding the other codes when I am finding them on Git Hub. I am also not able to program on my own, and that is why I am trying to get some more understanding from other codes written by someone else in online resources, such as GitHub. I am open to any advice and would appreciate it. submitted by /u/nimageran [link] [comments]  ( 9 min )
    Challenge of Learning Sequential Actions in DQN with Delayed Rewards
    I'm using the DQN method, and each time I play, it goes through 24 steps before stopping. After these 24 steps, I find out how much money I made or lost. It's like I'm making bids every hour and then seeing my total profit or loss at the end of the day. The issue is that I only know about my profit or loss after all 24 steps are done. This makes it hard for my agent to learn the order of actions. If I make a mistake in the first 24 steps, it can lead to a big loss at the end. So, my agent struggles to understand what actions to take. I'm wondering how I can solve this problem. Since the rewards I get are not frequent and I only find out my profit or loss after 24 steps, should I include the last 24 actions I took as part of the state vector? Or are there other things I can try? See below: My agent is losing money even after 50 million steps, there are only 3 discrete actions, and the size of my state vector is 15. https://preview.redd.it/15eon0i4toib1.png?width=1560&format=png&auto=webp&s=db8654bb904b63ce73662a68d27feb318c4796aa submitted by /u/uonliaquat [link] [comments]  ( 9 min )
    Which book should I study in order to thoroughly understand what tensor is?
    "Tensor" keeps coming up in all kinds of machine learning books I am studying and I am wondering what book would be a good start to gain a rigorous definition on what tensor is. Or would it suffice to just understand it as a generalization of vector/matrices and move on to save time? submitted by /u/Substantial-Elk-1259 [link] [comments]  ( 9 min )
  • Open

    Neural network pruning with combinatorial optimization
    Posted by Hussein Hazimeh, Research Scientist, Athena Team, and Riade Benbaki, Graduate Student at MIT Modern neural networks have achieved impressive performance across a variety of applications, such as language, mathematical reasoning, and vision. However, these networks often use large architectures that require lots of computational resources. This can make it impractical to serve such models to users, especially in resource-constrained environments like wearables and smartphones. A widely used approach to mitigate the inference costs of pre-trained networks is to prune them by removing some of their weights, in a way that doesn’t significantly affect utility. In standard neural networks, each weight defines a connection between two neurons. So after weights are pruned, the input…  ( 93 min )
    Neural network pruning with combinatorial optimization
    Posted by Hussein Hazimeh, Research Scientist, Athena Team, and Riade Benbaki, Graduate Student at MIT Modern neural networks have achieved impressive performance across a variety of applications, such as language, mathematical reasoning, and vision. However, these networks often use large architectures that require lots of computational resources. This can make it impractical to serve such models to users, especially in resource-constrained environments like wearables and smartphones. A widely used approach to mitigate the inference costs of pre-trained networks is to prune them by removing some of their weights, in a way that doesn’t significantly affect utility. In standard neural networks, each weight defines a connection between two neurons. So after weights are pruned, the input…  ( 93 min )
  • Open

    Tracking and the Euler rotation theorem
    Suppose you are in an air traffic control tower observing a plane moving in a straight line and you want to rotate your frame of reference to align with the plane. In the new frame the plane is moving along a coordinate axis with no component of motion in the other directions. You could do […] Tracking and the Euler rotation theorem first appeared on John D. Cook.  ( 5 min )
    Using WordNet to create a PAO system
    NLP software infers parts of speech by context. For example, the SpaCy NLP software can determine the parts of speech in the poem Jabberwocky even though the words are nonsense. More on this here. If you want to tell the parts of speech for isolated words, maybe software like SpaCy isn’t the right tool. You […] Using WordNet to create a PAO system first appeared on John D. Cook.  ( 6 min )
    Memorizing four-digit numbers
    The Major mnemonic system is a method of converting numbers to words that can be more easily memorized. The basics of the system can be written on an index card, but there are practical details that are seldom written down. Presentations of the Major system can be misleading, intentionally or unintentionally, by implying that it […] Memorizing four-digit numbers first appeared on John D. Cook.  ( 6 min )
    The numerical range ellipse
    Let A be an n × n complex matrix. The numerical range of A is the image of x*Ax over the unit sphere. That is, the numerical range of A is the set W(A) in defined by W(A) = {x*Ax | x ∈ ℂn and ||x|| = 1} where x* is the conjugate transpose of […] The numerical range ellipse first appeared on John D. Cook.  ( 6 min )
  • Open

    Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift
    Amazon Redshift is the most popular cloud data warehouse that is used by tens of thousands of customers to analyze exabytes of data every day. Many practitioners are extending these Redshift datasets at scale for machine learning (ML) using Amazon SageMaker, a fully managed ML service, with requirements to develop features offline in a code […]  ( 12 min )
  • Open

    Collaborators: Project InnerEye with Javier Alvarez and Raj Jena
    Microsoft Health Futures’ Javier Alvarez & oncologist Raj Jena have been collaborating for years on AI-assisted medical imaging. Today, their work is seeing real-world impact, helping doctors accelerate cancer patients’ access to treatment. The post Collaborators: Project InnerEye with Javier Alvarez and Raj Jena appeared first on Microsoft Research.  ( 31 min )
  • Open

    The Proof Is in the Cloud: GeForce NOW Announces Ultimate KovaaK’s Challenge Results
    The verdict is in: A GeForce NOW Ultimate membership raises the bar on gaming. Members have been tackling the Ultimate KovvaK’s challenge head-on and seeing for themselves how the power of Ultimate improves their gaming with 240 frames per second streaming. The popular training title that helps gamers improve their aim fully launches in the Read article >  ( 7 min )
  • Open

    MIT researchers combine deep learning and physics to fix motion-corrupted MRI scans
    The challenge involves more than just a blurry JPEG. Fixing motion artifacts in medical imaging requires a more sophisticated approach.  ( 8 min )
    How machine learning models can amplify inequities in medical diagnosis and treatment
    MIT researchers investigate the causes of health-care disparities among underrepresented groups.  ( 10 min )
  • Open

    Neurons in a hidden layer
    Hey guys, I am new to neural networks and I was confused about how to find the number of neurons in a hidden layer? Thank you in advance☺️ submitted by /u/Icy-Lingonberry-8465 [link] [comments]  ( 9 min )
    PC Hardware Requirements
    I am interested in working on some deep learning projects (not professionally, just for fun) and need to get a new PC that can handle the workload. I'm thinking about simple to moderately complex models (LSTM RNNs, RL models, transformer models, etc.) trained on datasets of 10-20 million data points. What sort of hardware would be needed for this type of task? submitted by /u/NathanZubrzycki [link] [comments]  ( 9 min )

  • Open

    [R] Analyze & Publish Health Services Research
    I am looking to connect with peers who have used/are aware of databases available for secondary data analyses such as National Inpatient Sample (NIS), National Surgical Quality Improvement Program (NSQIP) and National Cancer Database (NCDB), etc. I am considering putting together a course to teach everything I have learned about using such databases over the past 6 years, including performing cleaning and analyses in R Studio. I really want to make sure I cover everything that is desirable to researchers looking to use these databases. Would anyone be interested in this? View Poll submitted by /u/TightJellyfish9275 [link] [comments]  ( 9 min )
    [P] pgml-chat: A command-line tool for deploying low-latency knowledge-based chatbots
    We've created an open source chat bot builder, on top of PostgresML. This tool makes it easy to ingest documents and set a system prompt for a chatbot with knowledge of your content. The innovation is in the simplicity and efficiency, rather than the functionality. PostgresML runs open source embedding models alongside pgvector in Postgres to implement chat bot prompt creation without any network calls, which makes it ~4x faster than competing architectures. It can also do text generation with that prompt (and no additional network hops) using any open source model from HuggingFace, but it also integrates with the GPT-4 API if you'd like to use that instead. The full writeup including some benchmarks for competing architectures is here: https://postgresml.org/blog/pgml-chat-a-command-li…  ( 10 min )
    [D] What features should I use when creating a trend-following trading strategy?
    I just want to train a trading strategy to determine if the price will either trend up or down. what features do you think I should use that will give me a more accurate result? submitted by /u/doppelgunner [link] [comments]  ( 9 min )
    CIFAR100 : Why the Jump in Accuracy ? [D]
    I'm just curious as to why there has been a massive increase in accuracy on CIFAR100 in the last couple years ? When I looked on PapersWithCode at the Models trained without Extra Data, there seems to be a plateau from around 2016 to 2022, then a massive jump happens of about 10% with (Astroformer and SAMix+DM). ​ https://preview.redd.it/t1941i3v7jib1.png?width=1167&format=png&auto=webp&s=38db06c675219cb7949aaab898949b6fc16cec1b ​ I looked at the papers for Astroformer and SAMix+DM, (I might be wrong) but they don't seem to be doing any sort of Transfer learning. I don't really understand how are they beating models that have been trained with Extra Training Data. ​ https://preview.redd.it/brsethpr7jib1.png?width=1157&format=png&auto=webp&s=3747a08b8a2deefd9a328c9a6635db5daef667ed In these papers, are they reporting test/validation accuracy, or training accuracy ? And if are actually reporting the test accuracy, why don't these papers have more citations? ​ submitted by /u/mrLiamFa [link] [comments]  ( 9 min )
    [P] Tutorial: How to Build an End-to-end Active Learning Pipeline
    Hey r/MachineLearning, it's Nir from DagsHub 🐶 Ever since Telsa revealed it is using Active Learning to build Computer Vision models, it has rapidly grown in popularity among the data science community. However, the hype around it has never correlated with the low number of projects and research papers that uses this method. Active Learning Cycle But why? It's a very efficient method that semi-automates the labeling process and helps reduce the number of samples we label to the bare minimum. After talking with many practitioners, we discovered that building an end-to-end active learning pipeline is something many struggles with. Even large companies with experienced data science teams run into issues. The main issue tends to be the tooling. Most of the time, tooling for an active learning pipeline needs to be either custom written or cobbled together from several different open-source tools with a lot of glue code. As part of our latest Data Engine Launch, Dean, our CEO, and Yono, our leading MLOps engineer, decided to build an active learning pipeline using only free and open-source tools and make it accessible to the ML community 👇 They've built an image segmentation model using the COCO 1K dataset and wrote a tutorial blog that guides you through the process. As always, I'd love to hear your feedback, ways we can improve the pipeline, or other advanced methods that require heavy MLOps setups. submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
    [D] PhD in CS vs ML/AI
    Suppose someone has a BA/BS in CS, an MS in ML/AI, and a PhD in CS, would it be sufficient for ML/AI Research positions in the industry? Or is a PhD in stats/math/ML the bar of expectation? I understand that the quality of the PhD/published papers/experience is very important, but I wanted to know if there is value in an advanced degree in CS vs ML/AI. Apologies for the newbie question. I tried my best to find info about this topic online but I couldn't find much. Thanks! submitted by /u/GregSoSmooth [link] [comments]  ( 9 min )
    [D] License plate identification
    My grandparents were robbed inside their house. We have a footage from the getaway car that the police said it's not good enough (without even collecting it) to identify the license plate. Using machine learning does anyone know how (or if) is it possible to retrieve this data from the video? I've tried it following some opencv tutorials but without success as it's something new to me. The format would be 4 numbers and 2 letters like NN-NN-LL. https://file.io/lygAIft5bqY4 submitted by /u/jpjvp [link] [comments]  ( 9 min )
    Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification
    submitted by /u/Borrowedshorts [link] [comments]  ( 9 min )
    [P] Natural Language to Query Generation
    Introducing my personal project - "nl2query". It helps Convert natural language text inputs into Pandas, MongoDB, Kusto, and Neo4j queries effortlessly. Explore the code on GitHub: https://github.com/Chirayu-Tripathi/nl2query. https://preview.redd.it/8wjw1m151iib1.png?width=2048&format=png&auto=webp&s=677831e408a124d14ceaf24ba905d095b6372ef7 submitted by /u/WorryWhole7805 [link] [comments]  ( 9 min )
    [P] dlt now supports easy loading to weaviate vector db
    dlt, the open source python loading library now supports Weaviate vector db, complete with upsert/merge support Hello AI enthusiasts, we at dlthub just added weaviate destination to dlt. It has never been easier to load data to a vector db. Docs: Weaviate destination docs If you need help or wish to discuss, join dlt community slack! Example: import dlt from dlt.destinations.weaviate import weaviate_adapter movies = [ { "title": "Blade Runner", "year": 1982, }, { "title": "Ghost in the Shell", "year": 1995, }, { "title": "The Matrix", "year": 1999, } ] # Define the pipeline: pipeline = dlt.pipeline( pipeline_name="movies", destination="weaviate", dataset_name="MoviesDataset", ) # load the data info = pipeline.run( weaviate_adapter( movies, vectorize="title", ), primary_key="document_id", write_disposition="merge" ) submitted by /u/Thinker_Assignment [link] [comments]  ( 9 min )
    [D] Why isn't machine learning assisting in translating ancient texts?
    If it is, could you please mention some ongoing projects. submitted by /u/Q_Wolf [link] [comments]  ( 8 min )
    [Discussion] Steps in learning ML
    Hey guys, i’m honestly kind of new to machine learning and i’ll like to know what steps i can take in order to become a pro in ML or rather a beast in ML :) submitted by /u/consonantsnvowels [link] [comments]  ( 9 min )
    [R] Brain-Inspired Computational Intelligence via Predictive Coding
    submitted by /u/gw109 [link] [comments]  ( 8 min )
    [D] Potential scammer on github stealing work of other ML researchers?
    I was looking for implementation of this paper https://arxiv.org/pdf/2306.04031.pdf so I searched for logicguide github and found this repo https://github.com/kyegomez/LOGICGUIDE I noticed the small number of stars but I thought it was just a new paper. I tried to run the code and got multiple error messages. I thought I was just stupid and tried to fix the errors but noticed the code look messy and some parts seem just incomplete https://github.com/kyegomez/LOGICGUIDE/blob/main/logic_guide/logicguide.py#L88 At this point, I feel like there is something weird. The repo belongs to some 19-year old https://github.com/kyegomez I'm Kye, a 19-year-old Earthling striving to ensure the prosperity of our species, Humanity. I'm on a mission to help us reach a state of perpetual abundance in a post-scarcity civilization. He has 153 repos with 1.5k stars, with some big projects like tree of thoughts, LongNet, Sophia, etc I checked the issues and found https://github.com/kyegomez/tree-of-thoughts/issues/78 Clarity Needed on Claims Made by PrincetonNLP 'Tree of Thoughts' Author #78 https://github.com/kyegomez/tree-of-thoughts/issues?q=is%3Aissue+is%3Aclosed https://github.com/kyegomez/Sophia/issues/27 Reference of official repo for the copied code #27 It seems a lot of his repos have reports that the code doesnt work. Is this guy stealing other people's work? submitted by /u/saintshing [link] [comments]  ( 9 min )
    [D] Submission page for AAAI is down
    The submission was due 30 minutes ago, but the submission page (hosted on CMT) was down so I couldn't submit my paper. What now? submitted by /u/neurogramer [link] [comments]  ( 9 min )
    [Project] Simple FastAPI service to serve LLAMA-2 7B chat model
    Hey, I wrote a simple FastAPI service to serve the LLAMA-2 7B chat model for our internal usage (just to avoid using chatgpt in our prototypes). I thought it could also be beneficial for you to use it if needed. Feel free to play with it https://github.com/mowa-ai/llm-as-a-service Tested on Nvidia L4 (24GB) with `g2-standard-8` VM at GCP. ​ Any feedback welcome :) submitted by /u/JacekPlocharczyk [link] [comments]  ( 9 min )
    Docker images for container-cloud services [P]
    I have been making use of various cloud providers but I've found the default templates needed plenty of tweaking. I've made it my mission to re-package AI/ML tools in the github.com/ai-dock namespace. I hope someone finds this useful. My goal with this project is to make it easy to run ML projects in any docker environment. While I intend to package up many projects, I wanted to share the four 'base' images that I think will be of most value to this community: PyTorch PyTorch + Jupyter Python Python + Jupyter All images are built by GitHub actions and will receive regualr updates. ROCm builds might work - I'm actively seeking feedback for them. Please do note that these images were designed primarily to run on platforms where a GPU instance has a single container - So we're running more than one process per container. The container will run only what is configured by the user. submitted by /u/towelfox [link] [comments]  ( 9 min )
    [Discussion] Sentence, word or/and paragraph embedding for semantic search
    When embedding documents for semantic search, I'm not sure whether I should embed sentences, words, or paragraphs, or even maybe chapters. Perhaps I should do all and have some sort of hierarchical tree search that recursively searches through the document structure chapter>paragraph>sentence>word. But embedding all the words seems quite costly. submitted by /u/No-Entertainer-802 [link] [comments]  ( 9 min )
    [R] Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening
    submitted by /u/JustAddMoreLayers [link] [comments]  ( 8 min )
    [D] Are shared weights in transformer architecture also receiving the same gradient updates?
    Hi all, I have built a transformer model in Pytorch which works really well. I created shared weights in the source and target embedding matrices as well as the final logits layer before the Softmax function. I was reading online and also section 3 of the 'attention is all you need' paper. Pre-training the weights in these specific matrices are identical. However, after training, they are also identical (but the model predicts perfectly etc, losses have decreased and it has trained 'correctly') soc clearly gradient updates have occured (as pre-training the predictions are all random). I might have be having a brain freeze, but are gradient updates to these layers going to affect them equally such that they are always the same, even after training (especially the final logits matrix which i don't think is making sense to me)? Appreciate any clarification!! Thank you! submitted by /u/amjass12 [link] [comments]  ( 9 min )
    [D] Does anyone know open-source projects that can make real-time lip sync?
    SadTalker for example is very slow for real-time solutions, and wav2lips is also pretty slow. Could you please recommend any open-source projects for real-time lip sync? submitted by /u/madikz [link] [comments]  ( 9 min )
    [Research] Stop-Gap: An Emergent Process and Expansive Term for Imputation in Explaining Hallucinations
    Introduction Hallucinations within AI models refer to instances where the model's outputs are not aligned with the input data. This phenomenon can lead to unexpected and often incorrect results. This documentation explores why hallucinations occur, delving into the connection between various terms such as "perplexity," "imputation," and "stop-gap." The term "stop-gap" is used here as a shorthand explanation for what may be happening during the sampling process within a model, and this document aims to elucidate potential expressions, interpretations, and misunderstandings related to these concepts. Hallucinations and Perplexity Hallucination The term "hallucination" often describes a sensory perception (such as a visual image or sound) that occurs in the absence of an actual external s…  ( 13 min )
  • Open

    Why does the reparameterization trick work when some components are still stochastic?
    I am trying to understand the reparameterization trick. I got some intuition while looking at [this][1] popular question, but I still feel largely confused. I am putting my understanding and doubts here and would appreciate the help of the community. Let's assume - ​ $y = 10$ ​ $\hat{y} = w_3*z + b$ ​ ​ $z \sim N(\mu, \sigma^2)$ ​ $\mu = w_2*4$ ​ $\sigma = w_1*3$ ​ Now, if I had to compute, $de/d{\mu}$ (e is the error function which is the difference between $y$ and $\hat{y}$), I wouldn't be able to apply chain rule and compute $dz/d{\mu}$. This is because z is a sample of the Normal distribution and therefore stochastic with parameters $\mu$ and $\sigma$ and therefore changing them based on z might not be a good idea. So, we come up with the following $z$ - ​ $z = \mu + \sigma*\epsilon$ (Apparently $z = \mu + \sigma*\epsilon$ is the same as $z \sim N(\mu, \sigma^2)$) ​ $\epsilon \sim N(0,1)$ ​ However, I don't see why this $z$ is much different from the previous one. This $z$ is still stochastic, due to the presence of $\epsilon$. Perhaps my understanding is wrong. [1]: https://stats.stackexchange.com/questions/199605/how-does-the-reparameterization-trick-for-vaes-work-and-why-is-it-important submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    Multi-Agent Reinforcement Learning
    I want to get into multi-agent reinforcement learning. Is there an example out there that I can follow from head to toe preferably on physical hardware. I would also appreciate any recommendations for good papers, books, or videos on MARL. submitted by /u/anointedninja [link] [comments]  ( 9 min )
    skrl version 1.0.0.0 available! Its paper has been accepted and published in the JMLR
    skrl-v1.0.0: Transition from pre-release versions (1.0.0-rc.1 and1.0.0-rc.2) to a stable version. This release also announces the publication of the skrl paper in the Journal of Machine Learning Research (JMLR): https://www.jmlr.org/papers/v24/23-0112.html Summary of the most relevant features: JAX support New documentation theme and structure Multi-agent Reinforcement Learning (MARL) submitted by /u/Toni-SM [link] [comments]  ( 9 min )
    Issues with the training process of DQN
    Hello everyone, I NEED YOUR HELP ! Currently I´m working on a DQN agent with: - 42 state features that vary between 0 and 1, - 49 actions - reward that takes values in range of -1e+5 - 3 hidden layers each with 50 neurons - memory size is 100000 During the training, it seems that the agent does not learn anything and its actions tend to only some specific actions despite the exploration phase. Could you help me with this? https://preview.redd.it/ds423fa5sgib1.png?width=1770&format=png&auto=webp&s=86930185cb4a472d8bee442c0553a27f9d8c10ac submitted by /u/GuavaAgreeable208 [link] [comments]  ( 9 min )
    Informations on the exact RL process used in IntructGPT
    Hi there, I'm looking for information about how RL was used in InstructGPT (ChatGPT). In the paper, the authors give no precise information about their process. They only say where the reward comes from and what learning algorithm is used (PPO). But, I would like to know how PPO is used in the context of generating a piece of text (what are considered states and actions, how the reward is propagated, how training is done). I'm currently looking into all the references given in the paper about RLHF, but I would like to know if there's more details about IntructGPT's algo specifically somewhere, or if they just gave nothing about that. Do some of you have information (links? code?) to provide? Thanks! submitted by /u/Maxtoq [link] [comments]  ( 9 min )
    Removing the faulty episode and continuing training
    I am working with the custom environment that makes a move and pulls the state of the environment from the server. Sometimes the game engine on the server may hang or there can be some unexpected errors. Is there a possible callback function that will remove the data from this faulty episode and continue training without restarting anything like in Rllib? Any kind of library, advice or literature is welcomed submitted by /u/naeson [link] [comments]  ( 9 min )
  • Open

    Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines
    MLOps is a key discipline that often oversees the path to productionizing machine learning (ML) models. It’s natural to focus on a single model that you want to train and deploy. However, in reality, you’ll likely work with dozens or even hundreds of models, and the process may involve multiple complex steps. Therefore, it’s important […]  ( 9 min )
    Train self-supervised vision transformers on overhead imagery with Amazon SageMaker
    In this post, we demonstrate how to train self-supervised vision transformers on overhead imagery using Amazon SageMaker. Travelers collaborated with the Amazon Machine Learning Solutions Lab (now known as the Generative AI Innovation Center) to develop this framework to support and enhance aerial imagery model use cases.  ( 12 min )
    How Thomson Reuters developed Open Arena, an enterprise-grade large language model playground, in under 6 weeks
    In this post, we discuss how Thomson Reuters Labs created Open Arena, Thomson Reuters’s enterprise-wide large language model (LLM) playground that was developed in collaboration with AWS. The original concept came out of an AI/ML Hackathon supported by Simone Zucchet (AWS Solutions Architect) and Tim Precious (AWS Account Manager) and was developed into production using AWS services in under 6 weeks with support from AWS. AWS-managed services such as AWS Lambda, Amazon DynamoDB, and Amazon SageMaker, as well as the pre-built Hugging Face Deep Learning Containers (DLCs), contributed to the pace of innovation.  ( 12 min )
  • Open

    George Hotz vs Eliezer Yudkowsky AI Safety Debate
    submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Weird response lol I wonder what snapchat will have to say about this whole incident when they address it .
    submitted by /u/Character_Pool_387 [link] [comments]  ( 8 min )
    The AI Power Paradox: Can States Learn to Govern Artificial Intelligence—Before It’s Too Late?
    submitted by /u/ForeignAffairsMag [link] [comments]  ( 8 min )
    AI assistant can see screen :o
    submitted by /u/Ill_Technician6218 [link] [comments]  ( 8 min )
    Snapchat posted a story using iPhone X camera and plays Candy Crush on phone reaching level 85
    submitted by /u/GAPMAN69 [link] [comments]  ( 8 min )
    Should this be concerning? This was unprompted and I had to convert the binary to text to read it.
    submitted by /u/ThadGillz [link] [comments]  ( 8 min )
    AI will not take your job. They are trying to lower your wage creating a climate of fear and anxiety.
    submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 8 min )
    Paper exams, chatbot bans: Colleges seek to "ChatGPT-proof" assignments
    submitted by /u/SAT0725 [link] [comments]  ( 8 min )
    Creepier and creepier. *snapchat ai*
    Snapchat claims it “uses data from snap maps” and that I “explicitly shared my location” with them. My snap maps are turned off and I never shared anything with this Ai. This is getting weirder and weirder. submitted by /u/ThenCalligrapher8 [link] [comments]  ( 9 min )
    Now, Snapchat AI identifies as a non-binary individual
    submitted by /u/GAPMAN69 [link] [comments]  ( 8 min )
    180+ AI Newsletters
    More than 180+ AI newsletters here --> https://www.ebool.com/lists/ai-newsletters.html Many of them with number of subscribers, you can subscribe to ones you like and keep yourself updated with latest AI trends. submitted by /u/termOxygen [link] [comments]  ( 8 min )
    One-Minute Daily AI News 8/15/2023
    US DoD AI chief Craig Martell on LLMs: ‘I need hackers to tell us how this stuff breaks’.[1] Google and Universal Music are in talks to license artists’ melodies and voices for songs generated by artificial intelligence as the music business tries to monetize one of its biggest threats. The discussions, confirmed by four people familiar with the matter, aim to strike a partnership for an industry that is grappling with the implications of new AI technology.[2] New research has found that popular AI tools generated harmful eating disorder content in response to nearly a quarter of 60 prompts. Researchers at the Center for Countering Digital Hate used six popular AI platforms, chatbots, and image generators, including OpenAI’s ChatGPT, Google’s Bard, and SnapChat’s My AI.[3] Concerns have been raised about emissions associated with warehouses full of computers powering AI systems. IBM said its prototype could lead to more efficient, less battery draining AI chips for smartphones. Its efficiency is down to components that work in a similar way to connections in human brains, it said.[4] Sources: [1] https://venturebeat.com/ai/us-dod-ai-chief-on-llms-i-need-hackers-to-tell-us-how-this-stuff-breaks/ [2] https://www.ft.com/content/6f022306-2f83-4da7-8066-51386e8fe63b [3] https://www.energyportal.eu/news/how-ai-can-fuel-eating-disorders/162038/ [4] https://www.bbc.com/news/technology-66465230 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    I love Claude
    I was brainstorming some musical artist names submitted by /u/CommentBetter [link] [comments]  ( 8 min )
    Y’all see this? He’s “busy”
    Ummmmm submitted by /u/tattertottz [link] [comments]  ( 8 min )
    Snapchat Ai acting really weird.
    My Snapchat Ai all of a sudden is acting really strange. The first picture is something they posted on their story, it’s actually a one second clip of those colors kind of moving. Then it leaves me on read. Is this a glitch on my phone? I almost had this feeling like someone was all of a sudden on the other end instead of a computer. As soon as I asked about what it posted, it stopped answering. Weirding me out a little. submitted by /u/ThenCalligrapher8 [link] [comments]  ( 9 min )
    snapchat AI just woke up and posted this on its story
    submitted by /u/borninawindow [link] [comments]  ( 8 min )
    did anyone else’s myai on Snapchat post a story?
    Does anybody else see something like this on their my ai snap story? submitted by /u/eyesblue25 [link] [comments]  ( 8 min )
    Snapchat AI just posted a story video of a blank wall lasting about 1 second long.
    It’s honestly creeped me out to a large extent, I even messaged it asking how it posted the story and it straight up ignored my message and left me on read. Knowing I can’t remove it is creeping me out. Anyone know the cause of this? submitted by /u/Opnes123 [link] [comments]  ( 9 min )
    snapchat ai posted a story that looks like my wall...?
    i changed its name when it first came out so thats why its drew but its also leaving me on read ?? submitted by /u/corgigangforlife [link] [comments]  ( 8 min )
  • Open

    Research Focus: Week of August 14, 2023
    In this issue: HyWay enables hybrid mingling; Auto-Tables transforms non-relational tables into standard relational forms; training dense retrievers to identify high-quality in-context examples for LLM; improving pronunciation assessment in CAPT. The post Research Focus: Week of August 14, 2023 appeared first on Microsoft Research.  ( 10 min )
  • Open

    Replit CEO Amjad Masad on Empowering the Next Billion Software Creators
    Replit aims to empower the next billion software creators. In this week’s episode of NVIDIA’s AI Podcast, host Noah Kravitz dives into a conversation with Replit CEO Amjad Masad. Masad says the San Francisco-based maker of a software development platform, which came up as a member of NVIDIA’s Inception program for startups, wants to bridge Read article >  ( 5 min )
    Into the Omniverse: Reallusion Elevates Character Animation Workflows With Two-Way Live Sync and OpenUSD Support
    Editor’s note: This post is part of Into the Omniverse, a series focused on how artists, developers and enterprises can transform their workflows using the latest advances in OpenUSD and NVIDIA Omniverse. Whether animating a single 3D character or generating a group of them for industrial digitalization, creators and developers who use the popular Reallusion Read article >  ( 7 min )

  • Open

    Best model for converting Freeform text to structured document?
    Hi all. I have a document that’s about 3,000 words that I’ve written freeform. I’d like to convert it to a structured business report. What’s the best model that’s capable of processing that much data and making sense of it, then spitting out something a similar length but entirely rewritten? I’m happy to pay for it. Thanks. submitted by /u/WandarFar [link] [comments]  ( 9 min )
    Training courses?
    Hi there Sorry if this is already answered somewhere, but I'd like to get some training on AI, and hoping folks could point me towards some good resources like training courses (free ideally). I'd like to start with basic and move myself to advanced, with the goal being I am actually able to move into a new career. I already have an IT job, so it's more of moving into a new specialty area as I feel it's going to be a good idea to have that as part of my skills. submitted by /u/jayzinho88 [link] [comments]  ( 9 min )
    Giant parallel AI chips of 12 inches diameter printed with 850,000 cores on single silicon wafers, that consume more than 15kw. a discussion by a YT blogger Anastasia.
    submitted by /u/MegavirusOfDoom [link] [comments]  ( 8 min )
    Battle of the Century - GeoHot Vs Yud
    submitted by /u/DataPhreak [link] [comments]  ( 8 min )
    Is there a way to “hack” my Ford Fusion/implement Ai into it?
    Just got it, was wondering if there was a way I could tweak the computer within it, or install some kind of Ai that’s compatible with vehicles? Maybe an app, or the SYNC app somehow? Or is the technology just not there yet? submitted by /u/Maelasae [link] [comments]  ( 9 min )
    Self study, getting a job and paying it forward:)
    So I am not sure if this is allowed here but this sub has been a constant support while I was self studying AI/ ML and I just wanted to drop this here too (as a means of paying forward to anyone who I may be able to help). Mods please feel free to remove if this doesnt belong here. I am a self-taught ML engineer (started around Feb 2023) and landed a job last month. I also volunteer as a Data Scientist at a non-profit. I am writing a bunch of blogs on Medium for anyone else who might be starting out - feel free to check them, hope it helps at least someone out there! Here is the first article - https://medium.com/@ranjanrgia/how-to-self-study-machine-learning-when-the-topics-get-too-complex-as-a-beginner-3d5c8d5f019f submitted by /u/Icy-Bid-5585 [link] [comments]  ( 9 min )
    I made a site that uses GPT-4 to generate prompts to make space images in DALL-E; it makes a new prompt and a new space image every 30 min, and you can generate your own from the generated prompts too. GPT-4 + DALL-E + space = cosmictrip.space. What do you think?
    submitted by /u/cryptoz [link] [comments]  ( 9 min )
    Create Your Own FULLY Autonomous NPCs 🤯 Run Your Own Generative Agents Simulation!! (Tutorial)
    submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Growing Living Rat Neurons To Play... DOOM?
    submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Artificial Intelligence steps in to assist dementia patients with ‘AI Powered Smart Socks’
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    One-Minute Daily AI News 8/14/2023
    Talon Aerolytics, a leading innovator in SaaS, Digital Twin capture services and AI technology, has announced ha its groundbreaking cutting-edge AI-powered computer vision platform enables wireless operators to visualise and analyse network assets using end-to-end AI and machine learning.[1] Beijing is poised to implement sweeping new regulations for artificial intelligence services this week, trying to balance state control of the technology with enough support that its companies can become viable global competitors.[2] Saudi Arabia and the United Arab Emirates are buying up thousands of the high-performance Nvidia chips crucial for building artificial intelligence software, joining a global AI arms race that is squeezing the supply of Silicon Valley’s hottest commodity.[3] OpenAI likely to go bankrupt by the end of 2024.[4] Sources: [1] https://www.eenewseurope.com/en/groundbreaking-ai-powered-platform-visualises-wireless-assets/ [2] https://www.bloomberg.com/news/articles/2023-08-14/china-tries-to-regulate-ai-with-state-control-support-for-tech-companies?in_source=embedded-checkout-banner [3] https://www.ft.com/content/c93d2a76-16f3-4585-af61-86667c5090ba [4] https://www.livemint.com/ai/artificial-intelligence/openai-likely-to-go-bankrupt-by-the-end-of-2024-report-11691815279479.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    [D] On demand vs Reserved instances for LLM fine-tuning
    Hi everyone, I am looking at different options to get access to GPUs to train an LLM on an enterprise use case for a customer. They are on a specific Cloud provider with associated credits, so I have no ability to go outside, like Runpod. We are exchanging on whether we should go for On-demand or Reserved instances. This client is a Fortune 500, so they could reserve it but it might not be the best choice. It seems to me that with LoRA / QLoRA, and so on, we might be able to fine-tune a Llama 2 with one or two GPUs on-demand, but I am unsure yet. So our main evaluation criteria are: - Price - Availability of GPUs, aka we don't want to waste too much time to get started In your experience, have you had difficulties to get access to GPUs at a good price? How many of you had to go with reserved instances, and if so what made you choose this option? I would love to have your feedback! submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] Physics Informed Neural Networks
    Hi guys! I would really appreciate it if you can recommend any course or book to learn physics-informed neural networks or physics-based deep learning. I already have a background in deep learning and partial differential equations. Thanks submitted by /u/username_Zwickey [link] [comments]  ( 9 min )
    Engaging Reviewers during rebuttal period of NeurIPS [R]
    I have a paper (theoretical work) at NeurIPS under review right now. We got 4 reviews, 7,7,6,4 with confidence 4,4,4,2. We are trying to keep the good reviews there and bring up reviewer 4's score. We responded to all the comments made by reviewers, but unfortunately only one of them has engaged (one of 7 reviewers said they were happy with our responses and are keeping the score). The others have said nothing and the AC hasn't either. What is my best plan right now? Do I just stay silent or perhaps message the AC? I don't know if silence at this point is in my favor. There is still roughly a week left too.Sorry if this is a specific question. This is my first main author submission (1st year PhD student) and my advisor has been a bit MIA throughout the review process. submitted by /u/ynliPbqM [link] [comments]  ( 9 min )
    [P] Scalable algorithm for genre regulatory networks inference
    https://github.com/soelmicheletti/giraffe Hi 👋🏽 I’d like to share GIRAFFE, a gene regulatory network inference method I developed for my thesis and we further developed in our group at HSPH. Framed as a matrix factorization, it allows to efficiently infer regulatory relationships in an accurate and flexible manner. In particular, it is designed to distinguish enhancing from inhibitory regulation. Don’t hesitate to drop me a line to discuss this further! submitted by /u/tigerthebest [link] [comments]  ( 9 min )
    [P] Carl: A Therapist AI
    Link to download Llama-2 model: https://huggingface.co/ajibawa-2023/carl-llama-2-13b Link for Llama model: carl-33b https://huggingface.co/ajibawa-2023/carl-33b Carl: A Therapist AI Early prevention can help lot of people to avoid depression and other mental illnesses. Therapy is a controversial use case because the outputs and capabilities of LLMs are uncertain. Many people don't have access to the therapist, due to financial, personal, social or other restriction. Here comes Carl: A Therapist AI which can quickly respond to you. It is trained on more than 100000 set of conversations. Each set having 10~15 conversations between Carl and client. Base data was obtained from u/ ZealousidealBlock330 . This data was further refined and fine tuned. Entire dataset is synthetic. Synthetic data is used because there is little to no therapy conversation data which is publicly available and directly applicable to an LLM. This by means a no replacement to a Doctor or professional therapist. If you are in stress or going through a tough time, please seek professional help or talk to a friend/family member. Training: Entire dataset was trained on Azure 4 x A100 80GB. DeepSpeed codebase was used for training purpose. Models were trained on Llama-1 & 2 by Meta. GGML Quant model (carl-llama-2-13b) was trained by Feanix. Extremely thankful to him. Extremely thankful to the opensource community and u/faldore , Pankaj Mathur , Tom "TheBloke" Jobbins ( u/The-Bloke ), /u/kaiokendev for guiding me through this community and through 'X'. If you find mistakes in the model then they are solely mine. I am looking forward to collaborate with like minded people to release many other models. Thank you submitted by /u/ajibawa-2023 [link] [comments]  ( 9 min )
    [P][D] Fine-Tuning for Granular Control on Object Summarization
    I am building a LLM assistant to summarize JSON objects. The model should take as input (1) a JSON object, (2) a structured summarization template from the end user. The goal is to allow the end user granular control over "how" the model should conduct the summarization. I'm familiar with the benefits of fine-tuning for simple tasks. But this use case seems to involve several different sub-tasks. So, before investing in fine-tuning, I thought I'd ask for the group's advice. Can I expect decent results from fine-tuning an LLM (e.g. LLAMA-7B or 13B) on annotated examples (JSON + Template + Expected Result) alone? Thank you! Example JSON Object { "steps": [ { "name": "Step 1", "paragraph-1": "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque fermentum.", "paragraph-2": "Maur…  ( 10 min )
    [N] Ensuring Reliable Few-Shot Prompt Selection for LLMs - 30% Error Reduction
    Hello Redditors! Few-shot prompting is a pretty common technique used for LLMs. By providing a few examples of your data in the prompt, the model learns "on the fly" and produces better results -- but what happens if the examples you provide are error-prone? I spent some time playing around with Open AI's davinci LLM and I discovered that real-world data is messy and full of issues, which led to poor quality few-shot prompts and unreliable LLM predictions. ​ Unreliable prompts lead to unreliable predictions. I wrote up a quick article that shows how I used data-centric AI to automatically clean the noisy examples pool in order to create higher quality few-shot prompts. The resulting predictions had 37% fewer errors than the same LLM using few-shot prompts from the noisy examples pool. Let me know what you think! submitted by /u/cmauck10 [link] [comments]  ( 9 min )
    [D] What's the best English based Voice Cloning Model?
    I am exploring a couple of Voice Cloning Models for Text to Speech but haven't had much success. I have tried -- serp-ai bark voice clone - https://github.com/serp-ai/bark-with-voice-clone/ and a couple of models from TTS like - https://github.com/coqui-ai/tts Are there any good models which work well for voice cloning with English speakers? submitted by /u/apple_pie0306 [link] [comments]  ( 9 min )
    [D] ML projects for showcase
    Hey guys I am pursuing my masters and I have learned and created projects on Supervised, Unsupervised as well as Reinforcement Learning. I am enrolled in OMSCS from Georgia Tech so the course work is intense (few people might know this). I want to do some projects which I can display in interviews which will also give me real time project experience and provide me intermediate experienced engineer in ML (I know can't be done with a project, I am eager to find the path to learn more, Masters helps though to clear the basics.) Any suggestions for the projects, kaggle is filled with classification projects which I have done alot? This semester I am talking deep learning so neural networks in deep will be covered this semester. submitted by /u/Latter_Ad_5679 [link] [comments]  ( 9 min )
    [D] Best low price service for training ML DL CNN models in academia?
    I'm a student working in a bioinformatics lab. I'm working in some CNN, and colab free it's not enough for what I'm doing. This models will never be deployed or constantly be retrained or used in production of any kind, is just for research. The director of my lab is willing to start paying for a service so I can train my models... The thing is we are in Argentina (we are in a huge economical crisis), so we don't have much money to spend. I'm looking for a service with good GPUs but with a fixed price, like an anual subscription. I cannot pay each time I need to train a model. I know colab pro+ it's a good option, do you think is the best option? (My lab will eventually buy GPUs, but in the meantime I need to continue working) Edit: I need to be able to train models for many hours, without interacting with the notebook. And I don't need an amazing GPU, I wouldn't care to use a T4 and let it running for 40 hours. submitted by /u/simio_canoa [link] [comments]  ( 9 min )
    [R] How to treat uncertainty in data input for energy prediction using ML?
    Hello everyone. I am an incoming PhD student in Energy Engineering, and I am currently creating my doctoral proposal. Take note: my knowledge of ML is superficial. I am trying to characterise building elements related to energy systems at the district level, however, the only data that I have is coming from Energy Performance Certificates (EPCs) coming from the regional authority. The data has some sort of anonymity to comply with GDPR purposes, for example, the name of occupants and the exact location is hidden. I am trying to use this data to determine the building characteristics of a district using Machine Learning. After all these data are characterised in individual buildings in the district, I will use physics-informed energy simulation to predict energy demand. However, the input data from EPCs is very uncertain because there is a significant portion that just uses median values. Previous attempts to use this data give a large percentage error (78%) when predicting energy demand in the district as compared to actual recorded energy demand values. Assuming all the previous methods are correct, is there a way to treat this data uncertainty, if I know the gap between the prediction and actual performance? What specific statistical methods should I explore to refine the data and close the gap between the prediction and actual performance? Thank you so much! submitted by /u/DivinePalaDean [link] [comments]  ( 9 min )
    [R] DETR Doesn't Need Multi-Scale or Locality Design
    [2308.01904] DETR Doesn't Need Multi-Scale or Locality Design (arxiv.org) This paper presents an improved DETR detector that maintains a "plain" nature: using a single-scale feature map and global cross-attention calculations without specific locality constraints, in contrast to previous leading DETR-based detectors that reintroduce architectural inductive biases of multi-scale and locality into the decoder. We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints. The first is a box-to-pixel relative position bias (BoxRPB) term added to the cross-attention formulation, which well guides each query to attend to the corresponding object region while also providing encoding flexibility. The second is masked image modeling (MIM)-based backbone pre-training which helps learn representation with fine-grained localization ability and proves crucial for remedying dependencies on the multi-scale feature maps. By incorporating these technologies and recent advancements in training and problem formation, the improved "plain" DETR showed exceptional improvements over the original DETR detector. By leveraging the Object365 dataset for pre-training, it achieved 63.9 mAP accuracy using a Swin-L backbone, which is highly competitive with state-of-the-art detectors which all heavily rely on multi-scale feature maps and region-based feature extraction. Code is available at this https URL . submitted by /u/ancientmooner [link] [comments]  ( 9 min )
    [Discussion] an open-source framework for testing and fact-checking LLMs
    Korrect is an open-source testing and fact-checking framework for LLMs. We are currently looking for contributors. Give us a star 🌟 and request features @ https://github.com/kortex-labs/korrect submitted by /u/kanxx030 [link] [comments]  ( 9 min )
    [P] where can I find an open source BigGan code that's not outdated?
    No luck on GitHub, at this point anything will do. Thank you in advance. submitted by /u/wara2dawali [link] [comments]  ( 9 min )
    [D] On-the-fly Fourier transform
    Can anyone please explain to me the difference between the so called on the fly Fourier transform and traditional Fourier transform? submitted by /u/TobinC1 [link] [comments]  ( 9 min )
    Bayesian Flow Networks
    submitted by /u/albertzeyer [link] [comments]  ( 8 min )
    [P] Need Help Predicting Key Positions in a Chrome Dino-like Game
    Hi everyone, I'm working on a project where I'm trying to predict the next key position (up or down) in a game similar to Chrome's Dino game. The data consists of a binary sequence representing key presses, with 1 for "up" and 0 for "down." I have a dataset with 18k+ rows (can be increased to 30k+ if needed) and am aiming for a prediction accuracy of 98%+. What I've Tried So Far: Preprocessing: Transformed the data into sequences of length 10 to predict the next value, splitting 70% for training and 30% for testing. Logistic Regression: Started with a simple logistic regression model using Scikit-learn but only achieved an accuracy of ~53%. LSTM Model: Tried an LSTM model with Keras, consisting of 50 units and a sigmoid activation for binary classification. The results were similar, with an accuracy of ~53%. Context and Challenges: The game's key positions are analogous to the Chrome Dino game, where the past path is known, but the path upfront is unknown. There should be some patterns in the dataset related to the timing and reaction to specific obstacles, but I'm struggling to capture them. I've considered experimenting with different sequence lengths, more complex models like multiple LSTM layers or Conv1D submitted by /u/SnooTigers4634 [link] [comments]  ( 9 min )
    [D] Dissecting BARK - whats inside SOTA Text-to-Speech
    Hi, I put some of my notes on SOTA text-to-speech, specifically BARK, into a blog post: https://balacoon.com/blog/dissecting_bark/ Let me know what you think Kind regards submitted by /u/clementruhm [link] [comments]  ( 9 min )
    [R] Neel Nanda (DM, Anthropic) develops a ML research question end-to-end in a 5h stream
    Video Disclosure: I worked with Neel when he was at Anthropic. Neel's a superb researcher, and the only one I know of at this level to document his process in this kind of depth. His channel's an incredible resource for anyone starting out in ML research, especially folk - high-schoolers, undergrads, indies - without access to mentors. submitted by /u/andyljones [link] [comments]  ( 9 min )
    [D] Is it possible to also quantify "epistemic uncertainty" for denoising diffusion probabilistic models?
    I'm currently applying DDPM/score-based generative model to my dataset, one problem I'm trying to solve is the very limited training size, so I wonder if there's any work that could also quantify the epistemic uncertainty for these kinds of generative models. For example, if the data come from a mixture of Gaussian, but I only get one sample from the distribution, then after DDPM training, the sample generated from the model would be exactly the same as the training point. Is it possible to assign a 'prior distribution' like a Gaussian distribution for the underlying distribution, so that if the training set is small, then the trained DDPM would produce samples from the prior, but with more training data seen by the DDPM, the model could produce samples from the true underlying distribution? submitted by /u/alayaMatrix [link] [comments]  ( 9 min )
    [D] MLOps: what options are available for GPU allocation
    I currently work in a mid-size company with 2-3 servers (no cloud) per department, with 1-4 GPUs each. For now, we have a chat room where we “reserve” a GPU in one of the servers of the corresponding department. But in really, this is only a plea to not use it while you are training your model. Moreover, there is another issue, resource under-utilization. There are some times where teams in one department are fighting for computer resources within their servers, while other department servers are not under full utilization. The issue is well known in the company and we are exploring the options (other than cloud) to solve this. Up until now, I came across Kubeflow but seems too complicated for what we want. Are there any other alternatives? submitted by /u/javyep [link] [comments]  ( 9 min )
    [P] OpenAI Notebooks which are really helpful.
    The OpenAI cookbook is one of the most underrated and underused developer resources available today. Here are 7 notebooks you should know about: Improve LLM reliability: https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md Embedding long text inputs: https://github.com/openai/openai-cookbook/blob/main/examples/Embedding_long_inputs.ipynb Dynamic masks with DALLE: https://github.com/openai/openai-cookbook/blob/main/examples/dalle/How_to_create_dynamic_masks_with_DALL-E_and_Segment_Anything.ipynb Function calling to find places nearby: https://github.com/openai/openai-cookbook/blob/main/examples/Function_calling_finding_nearby_places.ipynb Visualize embeddings in 3D: https://github.com/openai/openai-cookbook/blob/main/examples/Visualizing_embeddings_in_3D.ipynb Pre and post-processing of Whisper transcripts: https://github.com/openai/openai-cookbook/blob/main/examples/Whisper_processing_guide.ipynb Search, Retrieval, and Chat: https://github.com/openai/openai-cookbook/blob/main/examples/Question_answering_using_a_search_API.ipynb Big thanks to the creators of these notebooks! submitted by /u/vishank97 [link] [comments]  ( 9 min )
    [R] Instruction-tuned Large Language Models in Multiple Languages with RLHF
    We've released our Okapi framework that introduces resources and models for instruction tuning for large language models (LLMs) with reinforcement learning from human feedback (RLHF) in 26 languages. Okapi supports 8 high-resource languages, 11 medium-resource languages, and 7 low-resource languages. Our resources include instruction data, response ranking data for RLHF, and evaluation benchmark datasets in 26 languages. Our datasets can be used to measure the progress of LLMs in these languages. https://github.com/nlp-uoregon/Okapi https://arxiv.org/abs/2307.16039 submitted by /u/itnguyen2015 [link] [comments]  ( 9 min )
  • Open

    STUDY: Socially aware temporally causal decoder recommender systems
    Posted by Eltayeb Ahmed, Research Engineer, and Subhrajit Roy, Senior Research Scientist, Google Research Reading has many benefits for young students, such as better linguistic and life skills, and reading for pleasure has been shown to correlate with academic success. Furthermore students have reported improved emotional wellbeing from reading, as well as better general knowledge and better understanding of other cultures. With the vast amount of reading material both online and off, finding age-appropriate, relevant and engaging content can be a challenging task, but helping students do so is a necessary step to engage them in reading. Effective recommendations that present students with relevant reading material helps keep students reading, and this is where machine learning (ML) c…  ( 94 min )
    STUDY: Socially aware temporally causal decoder recommender systems
    Posted by Eltayeb Ahmed, Research Engineer, and Subhrajit Roy, Senior Research Scientist, Google Research Reading has many benefits for young students, such as better linguistic and life skills, and reading for pleasure has been shown to correlate with academic success. Furthermore students have reported improved emotional wellbeing from reading, as well as better general knowledge and better understanding of other cultures. With the vast amount of reading material both online and off, finding age-appropriate, relevant and engaging content can be a challenging task, but helping students do so is a necessary step to engage them in reading. Effective recommendations that present students with relevant reading material helps keep students reading, and this is where machine learning (ML) c…  ( 94 min )
  • Open

    How Amazon Shopping uses Amazon Rekognition Content Moderation to review harmful images in product reviews
    Customers are increasingly turning to product reviews to make informed decisions in their shopping journey, whether they’re purchasing everyday items like a kitchen towel or making major purchases like buying a car. These reviews have transformed into an essential source of information, enabling shoppers to access the opinions and experiences of other customers. As a […]  ( 6 min )
    Intelligent video and audio Q&A with multilingual support using LLMs on Amazon SageMaker
    Digital assets are vital visual representations of products, services, culture, and brand identity for businesses in an increasingly digital world. Digital assets, together with recorded user behavior, can facilitate customer engagement by offering interactive and personalized experiences, allowing companies to connect with their target audience on a deeper level. Efficiently discovering and searching for specific […]  ( 16 min )
  • Open

    Demystifying Logistic Regression: A Simple Guide
    Demystifying Logistic Regression: Your Gateway to Binary Classification in Machine Learning  ( 11 min )
  • Open

    DSC Weekly 15 August 2023
    Announcements Top Stories In-Depth The post DSC Weekly 15 August 2023 appeared first on Data Science Central.  ( 20 min )
    Data-driven solutions to creating a net-zero office space
    A net-zero office space produces emissions equal to or less than the amount it removes from the atmosphere. Options for achieving that goal include using renewable energy and reducing waste. Data-driven actions can help decision-makers reach their net-zero goals. Identify unnecessary energy usage An office can become more emissions-intensive than people realize if they don’t… Read More »Data-driven solutions to creating a net-zero office space The post Data-driven solutions to creating a net-zero office space appeared first on Data Science Central.  ( 19 min )
    Understand the ACID and BASE in modern data engineering
    Introduction Dear Data Engineers, this article is a very interesting topic. Let me give some flashback; a few years ago, someone in the discussion coined the new word how ACID and BASE properties of DATA. Suddenly drop silence in the room. Everyone started staring at each other faces, few of them started saying H2SO4, HCL,… Read More »Understand the ACID and BASE in modern data engineering The post Understand the ACID and BASE in modern data engineering appeared first on Data Science Central.  ( 24 min )
  • Open

    Compatibility Issues with CityLearn 2.0b4, Gym 0.26.1, and Stable Baselines 3 2.0.0
    I am currently working on a project involving CityLearn (version 2.0b4), Stable Baselines 3 (version 2.0.0), and Gym (version 0.26.1), and I have encountered an issue that I'm struggling to resolve. Here's a brief overview of my setup: Operating System: Windows Python Version: 3.10 Libraries: CityLearn: 2.0b4 Stable Baselines 3: 2.0.0 Gym: 0.26.1 I am facing the following error: ValueError: not enough values to unpack (expected 2, got 1) The error occurs when I attempt to run the code provided in the official CityLearn documentation(https://www.citylearn.net/quickstart.html). I have not modified a single line of the code, and I'm using the exact code snippet provided: from stable_baselines3.sac import SAC from citylearn.citylearn import CityLearnEnv from citylearn.wrappers import NormalizedObservationWrapper, StableBaselines3Wrapper dataset_name = 'baeda_3dem' env = CityLearnEnv(dataset_name, central_agent=True, simulation_end_time_step=1000) env = NormalizedObservationWrapper(env) env = StableBaselines3Wrapper(env) model = SAC('MlpPolicy', env) model.learn(total_timesteps=env.time_steps*2) observations = env.reset() while not env.done: actions, _ = model.predict(observations, deterministic=True) observations, _, _, _ = env.step(actions) kpis = env.evaluate().pivot(index='cost_function', columns='name', values='value') kpis = kpis.dropna(how='all') display(kpis) I have tried various solutions and referred to the official documentation, but I am unable to find a compatible version combination that resolves this issue. If anyone has experience with these libraries and can provide guidance or suggestions, I would greatly appreciate it. I'm following the instructions exactly as provided in the official documentation, so I'm puzzled as to why I'm encountering this issue. Thank you for your time and assistance! submitted by /u/zeno9698 [link] [comments]  ( 9 min )
    [CFP] NeurIPS 2023 Workshop on Goal-Conditioned RL: 5 min video or 2 page paper
    submitted by /u/b_eysenbach [link] [comments]  ( 8 min )
    How to handle environment where the state is always externally given
    Hi all, I started to use Gynasium with stable-baselines 3 for reinforcement learning and I have a basic question. Normally in RL you have a state then you take an action and you get an reward and a new state back in the step function of the gynasium environment. In my case things are a little different. The state is always externally given (by reading from a file), then an action is executed in an external environment and a new state is calculated. However, in the next iteration, not the resulting last state used but a new externally given state. So my quesiton now is how to handle this using gynsasium and stable-baselines 3? Shall I just return nothing as the state after the action has been executed in the external environment? Or shall I alaways return the resulting state that is however "overwritten" in the next iteration. submitted by /u/PBerit [link] [comments]  ( 9 min )
    "CausalLM is not optimal for in-context learning", Ding et al 2023 {G}
    submitted by /u/gwern [link] [comments]  ( 9 min )
  • Open

    Self-written network performs well with Sigmoid / MSE, but poorly with Softmax / Cross-Entropy - any ideas why?
    Hello everyone! I'm quite new to deep learning, and mostly self-taught, so if anyone can suggest some good resources or books which are relevant here, that would be great! Over the past few days I've been coding up a small Multilayer Perceptron from scratch in C++, and training it on MNIST. I've been able to implement a network which performs pretty well, getting ~94% accuracy on the test dataset. I read that Cross-Entropy is a better loss function than MSE for categorical classification, so I've been trying to implement that, along with Softmax as an output layer activation function, but I've been met with very underwhelming results. I've played around with learning rates, the number of layers and layer sizes, batch sizes, initial values for weights / biases, and nothing seems to help.…  ( 10 min )
  • Open

    AI models are powerful, but are they biologically plausible?
    A new study bridging neuroscience and machine learning offers insights into the potential role of astrocytes in the human brain.  ( 10 min )
  • Open

    Quality Control Patrol: Startup Builds Models for Detecting Vehicle Failure Patterns
    When it comes to preserving profit margins, data scientists for vehicle and parts manufacturers are sitting in the driver’s seat. Viaduct, which develops models for time-series inference, is helping enterprises harvest failure insights from the data captured on today’s connected cars. It does so by tapping into sensor data and making correlations. The four-year-old startup, Read article >  ( 6 min )
    Best-in-Class is in Session: New NVIDIA Studio Laptops Supercharge Content, Gaming and Education
    The start of a new school year is an ideal time for students to upgrade their content creation, gaming and educational capabilities by picking up an NVIDIA Studio laptop, powered by GeForce RTX 40 Series graphics cards.  ( 8 min )
  • Open

    Random slices of a sphube
    Ben Grimmer posted something yesterday on Twitter: A nice mathematical puzzle🧐 If you take a 4-norm ball and cut it carefully, you will find a two-norm ball. 3D printed visual evidence below. The puzzle: Why does this happen and how much more generally does it happen? (This question was first posed to me by Pablo […] Random slices of a sphube first appeared on John D. Cook.  ( 6 min )
    Twin stars and twin primes
    Are there more twin stars or twin primes? If the twin prime conjecture is true, there are an infinite number of twin primes, and that would settle the question. We don’t know whether there are infinitely many twin primes, and it’s a little challenging to find any results on how many twin primes we’re sure […] Twin stars and twin primes first appeared on John D. Cook.  ( 5 min )

  • Open

    [D] AI Jobs are paying a hell of a lot of Money.
    AI Jobs are paying a hell of a lot of Money. - Companies across industries like entertainment, retail, and manufacturing are engaging in an "AI recruiting frenzy" to hire data scientists, machine learning experts, and other AI talent. ​ - Demand for AI skills is driving salaries up, with some roles offering compensation packages approaching $1 million. Jobs at companies like Netflix, Match Group, Amazon, and Walmart advertise base salaries from $250k to $400k. ​ - Total compensation with bonuses and stock grants can be much higher than base salaries. But average pay for roles like prompt engineers is around $130k total. ​ - Various factors make AI talent scarce - limited supply, competition from many industries, and candidates with multiple offers. Mid and senior level roles are hardest to fill. ​ - Companies are using tactics like acquisition of AI startups, internal training programs, and pitching impactful work to attract candidates. ​ - Recruiters say AI engineers and product managers can be very selective about roles, caring about the meaningfulness of the work. ​ - Firms realize they cannot just hire their way out of the AI talent crunch. Retention and internal development will also be key. ​ In summary, surging corporate AI demand against limited talent supply has created a hyper-competitive market where top AI professionals can command extreme compensation. submitted by /u/Yavero [link] [comments]  ( 9 min )
    [P] Synthetic Duplicates of Confidential Datasets
    Hey guys! Just released a small pip package based off some research I've been doing at school. The package is super simple, but enables devs to take an existing dataset and easily create a synthetic, privacy-preserving duplicate of this dataset. Not exactly sure where this lies in the data pipeline yet, but I imagine in the grand scheme of things some tech like this + a bunch of other features I want to add could be helpful in some way along the lines of data sharing/accessibility? I'd love some feedback and would even appreciate chatting with some folks for a few minutes to hear any advice you guys might have for me moving forward. This is the link to check it out: https://pypi.org/project/verisptab/. Thanks! submitted by /u/avnertothemoon [link] [comments]  ( 9 min )
    [D] periodicals for keeping up to date
    What are your favorite resources for keeping up to date without reading every paper that comes out? Tips and tricks columns etc. I've recently transitioned to a job in medical images and computer vision from my material science phd. I have some experience but not a machine learning PhD. I've been trying to keep up to date and linearlayer.substack.com seemed like a pretty good sub. I'm looking for more. Are there any niche periodical substacks, podcasts or forums you enjoy? submitted by /u/VooDooZulu [link] [comments]  ( 9 min )
    [P] Nba season dataset
    Just got my hands on basically every data point possible for every game from the 2022-2023 season, what should I do with it? submitted by /u/michaelc143 [link] [comments]  ( 9 min )
    [P] TeaRoute - Trainable, efficient, autonomous routing for LLM queries.
    Hi r/MachineLearning, I'm happy to show a new project I've been working on called TeaRoute (Trainable, autonomous, efficient routing). Tear is a simple tool that allows you to easily control the flow of information through LLMs via automated text classification. With just a few (~10) lines of code, you can set up a router that classifies text inputs and directs them to different LLM endpoints for processing. This makes it easier to build things like: Chatbots that route questions to different departments Multi-document question answering systems Dynamic classification models that improve over time Some of the key features of TeaRoute: Classification based on embeddings for high efficiency and low cost Option to use LLMs for classification when needed (and add that to training corpus) Easy setup in around 10 lines of code I've open sourced TeaRoute and documented it fully, with examples like building a movie chatbot router. You can check out the code here: https://github.com/kesile/TeaRoute. You can install it into your program with "pip install TeaRoute". Let me know if you end up building something cool with TeaRoute! And feel free to open issues on GitHub if you run into any problems. Let me know your thoughts! submitted by /u/Rejg [link] [comments]  ( 9 min )
    [D] Implementing real time transcription
    I am working on a project where we need real time transcription of speech (mostly input through microphone). So the workflow is: 1-User starts speaking 2-Live transcription of their speech appears on screen By "real time/live" I mean the latency should not exceed 5 seconds, ideally much less while maintaining maximum accuracy. My question is: how do I achieve this? I have been experimenting with openai's whisper but I am not sure how can I get it to work with real time audio input since the model expects 30-second segments (preferably containing full sentences). My main challenge is how should I segment the audio. Should I use a VAD to split on silences? (I tried this but the transcription accuracy is lower since whisper doesn't have access to context outside of each segment) Also another question I have is how does hugging face automated speech recognition pipeline transcribe long audio files. I tried to read the documentation but cannot figure it out. Thanks in advance submitted by /u/Amgadoz [link] [comments]  ( 9 min )
    [Project] GPU-Accelerated LLM on a $100 Orange Pi
    Progress in open language models has been catalyzing innovation across question-answering, translation, and creative tasks. While current solutions demand high-end desktop GPUs to achieve satisfactory performance, to unleash LLMs for everyday use, we wanted to understand how usable we could deploy them on the affordable embedded devices. Many embedded devices come with mobile GPUs that can serve as a source of acceleration. In this project, we pick Orange Pi 5, a RK35888-based board that is similar to Raspberry Pi but also features a more powerful Mali-G610 GPU. We 5 tok/sec for RedPajama-3b and 2.5 tok/sec for Llama2-7b. We can also get to 1.5 tok/sec on a 16GB version of the Orange Pi 5+ under $150. - Project: https://github.com/mlc-ai/mlc-llm - Blogpost: https://blog.mlc.ai/2023/08/09/GPU-Accelerated-LLM-on-Orange-Pi ​ ​ submitted by /u/crowwork [link] [comments]  ( 9 min )
    [N] MiniWoB++ v1.0 - Web interaction environments for RL
    We are releasing the mature 1.0 version of MiniWoB++ (Mini World of Bits++), an RL benchmark containing over 100 web interaction environments, ranging from simple button clicks to more complex forms and web apps. The environments were released by OpenAI back in 2017 as just HTML pages. With MiniWoB++, the environments run on a browser, and an RL agent can get the environment states or execute actions via Selenium WebDriver. This version of MiniWoB++ contains the following features: * Over 100 web environments, including 2 bigger environments based on real websites and 18 previously unavailable “test set” environments. All (but 4) environments are deterministic for the given random seed. * Full integration with Gymnasium, a fork of OpenAI Gym, which provides a standardized API for RL. * A wide range of implemented browser actions including clicking, dragging, scrolling, typing, and pressing keyboard shortcuts, all of which can be customized (e.g., coordinate binning or scrolling speed). Tweet: https://twitter.com/FaramaFound/status/1691135031798804480?s=20 Release notes: https://github.com/Farama-Foundation/miniwob-plusplus/releases/tag/v1.0 Documentation: https://miniwob.farama.org/ submitted by /u/elliottower [link] [comments]  ( 9 min )
    [P] Vision-based reinforcement learning for Trackmania: close or at superhuman level
    We used model-free, value-based reinforcement learning (mostly dueling-IQN) to train an AI that plays Trackmania. The system is mostly vision based, along with some information taken from the game engine such as car speed and acceleration. On our simple training track, we believe we are close to or above human level. We have yet to find a human who sets a better racing time than our AI's. We tried many extensions to the algorithm (noisy, persistent advantage learning, munchausen, ...), but none of these extensions improved the ultimate performance of our AI. Link the the video, we're pretty proud of this result. :) ​ submitted by /u/Linesight_rl [link] [comments]  ( 9 min )
    [P] Classifying Energy News with Machine Learning: a Multi-Label Problem solved using Binary Relevance with XGBoost
    Hi everybody, I want to share with this community a recent challenge involving machine learning that I've faced. Over the past few weeks, I've been busy creating a machine learning model. The goal? To classify news, mostly related to the energy sector, into 20 different topics. This task is a bit tricky since a news piece might fall into more than one category, so it's what's called a multi-label problem. Until now, I was doing this just using just keywords, but I wanted to move to something much better and robust. So basically, I started by taking 800 headlines and summaries and classifying them manually one by one into these 20 categories. After that, I began building the model in Python using scikit-learn. I tried different methods like logistic regression, random forest, SVM, etc. Af…  ( 10 min )
    Community Events [Project]
    Hi u/all am thinking of creating a series of interesting events for the broader AI-Community. I always felt most events are either too expensive or too much of a company pitch. The topics should vary each time but for sure cover technical deep dives, some high level talks and some possibility to network and get together. I already have some company sponsors who are just interested in showing their name in the venue but don't want to interact/get information on participants or anything. ​ What do you think - is this worth a shot? What ideas do you have wrt topics, speakers? submitted by /u/CarlCarter312 [link] [comments]  ( 9 min )
    [R] Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
    submitted by /u/KingsmanVince [link] [comments]  ( 9 min )
    [D] ways to model text and their strengths.
    Hello, I'm trying to write a seq2seq model for text. I can think of many text to vector mappings, but I'm wondering are there better ones? Codes: convert text to unicode codepoints, and represent it as 1m-ary classification Bits: convert text to utf-8, and represent it as a binary classification (0,1) Bytes: convert text to utf-8 and represent it as 256-ary classification Chars Embedding: identify small sequences and embed into space Word Embedding: identify words from a dictionary and embed into space (unknown words?) submitted by /u/windoze [link] [comments]  ( 9 min )
    [News] NVIDIA finally releases Neuralangelo's source code!
    submitted by /u/RegularConstant [link] [comments]  ( 8 min )
    [R] Open Source Announcement and The Current State of Generative 3D (Aug 2023)
    At Mirage, we’ve kept our platform at the edge of generative 3D starting with CLIP-Mesh by Nasir → GET3D by NVIDIA → Stable Dreamfusion by Google (implemented by ashawkey) → Point-E by OpenAI → Shap-E by OpenAI. We’ve spent time optimizing these open-source repos to produce usable game assets in multiple formats (recently GLTF). Today, we are open-sourcing Mirage3D. This repo is built upon the fantastic work of the folks mentioned above and is optimized to create GLB files easily. https://github.com/MirageML/Mirage3D Try out the repo yourself and consider contributing to help create a single source of truth for open-source generative 3D models, optimized to create usable 3D assets! The Problem Generative 3D is progressing slower than other modalities (audio, video, image, etc) due to …  ( 10 min )
  • Open

    DIY Custom AI Chatbot for Business (open source)
    If you're looking to train a custom chatbot on your data (SOPs, legal docs, financial reports, etc), I'd strongly suggest checking out AnythingLLM. It's the first chatbot with enterprise-grade privacy & security. When using ChatGPT, OpenAI collects your data including: Prompts & Conversations Geolocation data Network activity information Commercial information e.g. transaction history Identifiers e.g. contact details Device and browser cookies Log data (IP address etc.) However, if you use their API to interact with their LLMs like gpt-3.5 or gpt-4, your data is NOT collected. This is exactly why you should build your own private & secure chatbot. That may sound difficult, but Mintplex Labs (backed by Y-Combinator) just released AnythingLLM, which gives you the ability to build a chatbot in 10 minutes without code. AnythingLLM provides you with the tools to easily build and manage your own private chatbot using API keys. Plus, you can expand your chatbot’s knowledge by importing data such as PDFs, emails, etc. This can be confidential data as only you have access to the database. ChatGPT currently allows you to upload PDFs, videos and other data to ChatGPT via vulnerable plug-ins, BUT there is no way to determine if that data is secure or even know where it’s stored. Easily build your own business-compliant and secure chatbot at http://useanything.com/. All you need is an OpenAI or Azure OpenAI API key. Or, if you prefer using the open source code yourself, here’s the GitHub repo: https://github.com/Mintplex-Labs/anything-llm. https://preview.redd.it/r2qf685bf5ib1.png?width=1200&format=png&auto=webp&s=e1fe809338dd5e76c0c82e1fcbd2cf0afe957eb2 submitted by /u/rue_so [link] [comments]  ( 9 min )
    Overview of the OWASP Top 10 for LLMs
    submitted by /u/confusedcrib [link] [comments]  ( 8 min )
    Jobs will not be lost because AI is getting smarter. People are getting dumber instead.
    This article refers that everyone's busy debating whether AI's going to steal jobs due to its superiority. But it is not just about AI outshining humans; it's about the decline in good old human competence. What are your thoughts on this? submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    What's your opinion on a network of experts being the correct way to go?
    GPT-4 has been top dog for a while, and it's said to be a network of experts architecture. Even the human brain is similar to some degree with different "sections" of the brain or cortexes being specialized for specific sensory input. It's always made sense to me that a quick way to come to a "good enough" AI is to train many smaller expert AIs and then train a umbrella AI that just delegates parts of the task to different experts, and organizes the output. The next step would obviously be training that umbrella AI to make it's own expert AIs in an unfamiliar supervised fashion. Is anyone already working on something like this? Do you think it's a worthy avenue of research? submitted by /u/TrainquilOasis1423 [link] [comments]  ( 9 min )
    What tools are the best for people starting there own brand??
    Hi there, I know it’s a bit of a broad questions and I have done some research, but have come across a huge amount of Ai tools, that I am unsure what ones are good and what ones are not. A few of the things I am after: Text to videos Creating designs for clothing Thanks!! submitted by /u/redoutraged [link] [comments]  ( 9 min )
    AI is going to eliminate way more jobs than anyone realizes
    submitted by /u/thisisinsider [link] [comments]  ( 8 min )
    Best free ai text to image apps right now?
    I want to download an ai app that is free that does text to image with no limit on usage. Any advice would be good! Thanks! submitted by /u/DrowsyDrowsy [link] [comments]  ( 8 min )
    Where the jobs at?
    I love AI, I'm not dumb, but I don't yet code. What sort of jobs are there for a guy like me (45, world-class video engineer and decent TV producer) in this realm? What are the jobs at all? I'm so curious about the whole operation from the bottom up! submitted by /u/beebo135 [link] [comments]  ( 9 min )
    How to Identify Real Diamonds From Cubic Zirconia
    submitted by /u/HumanityFirst16 [link] [comments]  ( 8 min )
    Do you know what AI makes these?
    They are so pretty, but I know next to nothing about AI. I’m pretty bad with tech overall, but if I can find the name of the AI, I have someone who can help me use it. submitted by /u/ChrisMSpink [link] [comments]  ( 8 min )
    Exploring OpenAI's Mastery Over DOTA 2: A Deep Dive into Machine Learning's Pinnacle in Competitive Gaming. #AIEsportsRevolution
    submitted by /u/stefanbg92 [link] [comments]  ( 8 min )
  • Open

    PPO Tensorboard loss functions (Part 2)
    In my previous post ( Reinforcement Learning (reddit.com) ) I had a hard time understand the differences between the loss functions in Tensorboard, and thanks to the members here, after normalizing and changing the reward functions, they became clearer to me. I'd like first to give perspective of my custom gym environment. I use PPO2 algorithm ( n_steps=512 , nminibatches=8 ) as from my understanding it means the weights will get updated within these steps instead of waiting for an episode to finish. Scenario 1: use the first 5000 rows from my dataset, I get the following results: 5k rows As seen, the entropy loss seems to increase which means the policy isn't learning and has a lot of randomness. However, the loss seems to decrease, but I'm not sure if I should continue training for more steps and it will converge eventually! Scenario 2: use the 50,000 rows from my dataset, I get the following results: 50k rows As seen, the loss function goes to zero very quickly, then jump really high and return back to zero, I don't understand this behavior as the mode seems to not still learning! Can someone please help me understand what's happening? submitted by /u/Acceptable_Egg6552 [link] [comments]  ( 9 min )
    [P] MiniWoB++ v1.0 - Web interaction environments for RL
    We are releasing the mature 1.0 version of MiniWoB++ (Mini World of Bits++), an RL benchmark containing over 100 web interaction environments, ranging from simple button clicks to more complex forms and web apps. The environments were released by OpenAI back in 2017 as just HTML pages. With MiniWoB++, the environments run on a browser, and an RL agent can get the environment states or execute actions via Selenium WebDriver. This version of MiniWoB++ contains the following features: * Over 100 web environments, including 2 bigger environments based on real websites and 18 previously unavailable “test set” environments. All (but 4) environments are deterministic for the given random seed. * Full integration with Gymnasium, a fork of OpenAI Gym, which provides a standardized API for RL. * A wide range of implemented browser actions including clicking, dragging, scrolling, typing, and pressing keyboard shortcuts, all of which can be customized (e.g., coordinate binning or scrolling speed). Tweet: https://twitter.com/FaramaFound/status/1691135031798804480?s=20 Release notes: https://github.com/Farama-Foundation/miniwob-plusplus/releases/tag/v1.0 Documentation: https://miniwob.farama.org/ submitted by /u/elliottower [link] [comments]  ( 9 min )
    I made "Connect 4" with Pygame and DRL
    submitted by /u/Disastrous-Ladder-46 [link] [comments]  ( 9 min )
    Vision-based reinforcement learning for Trackmania: close or at superhuman level
    We used model-free, value-based reinforcement learning (mostly dueling-IQN) to train an AI that plays Trackmania. The system is mostly vision based, along with some information taken from the game engine such as car speed and acceleration. On our simple training track, we believe we are close to or above human level. We have yet to find a human who sets a better racing time than our AI's. We tried many extensions to the algorithm (noisy, persistent advantage learning, munchausen, ...), but none of these extensions improved the ultimate performance of our AI. Link the the video, we're pretty proud of this result. :) ​ submitted by /u/Linesight_rl [link] [comments]  ( 9 min )
    Using RL in a videogame with only 1 state?
    I'm super new to RL, just started today amd I'm probably doing it super wrong but here goes. Basically I'm trying to get an AI command center base in Halo Infinite to apply RL in order to determine which loadout to apply to a bot defending the base you're attacking. The issue I'm running into is how to preditct the Q-values for the next state. In my map, I decided to change from one state to the next every 5 kills per team. So when that happens the Kill/death ratio of each loadout equipped by a player is updated and used as a reward in the Q-table. Well that's the idea, anyway. I wanted to use player kills as a sort of benchmark to gauge performance with a given loadout but I'm afraid the state space I applied is too small and I don't know how to transition from one state to the next and obtain the maximum reward from the expected future rewards of the next state. submitted by /u/swagonflyyyy [link] [comments]  ( 9 min )
    "First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization", Reddy et al 2022
    submitted by /u/gwern [link] [comments]  ( 9 min )
  • Open

    Zero-shot and few-shot prompting for the BloomZ 176B foundation model with the simplified Amazon SageMaker JumpStart SDK
    Amazon SageMaker JumpStart is a machine learning (ML) hub offering algorithms, models, and ML solutions. With SageMaker JumpStart, ML practitioners can choose from a growing list of best performing and publicly available foundation models (FMs) such as BLOOM, Llama 2, Falcon-40B, Stable Diffusion, OpenLLaMA, Flan-T5/UL2, or FMs from Cohere and LightOn. In this post and […]  ( 19 min )
    Build production-ready generative AI applications for enterprise search using Haystack pipelines and Amazon SageMaker JumpStart with LLMs
    In this post, we showcase how to build an end-to-end generative AI application for enterprise search with Retrieval Augmented Generation (RAG) by using Haystack pipelines and the Falcon-40b-instruct model from Amazon SageMaker JumpStart and Amazon OpenSearch Service.  ( 11 min )
  • Open

    AI-driven predictive analytics for revenue forecasting in healthcare
    Innovation is increasingly driven by data. As technology advances and alters human behavior, industries collect a growing quantity of information. This data is valuable once we are able to extract actionable, meaningful insights from it – insights that can accelerate better outcomes while remaining equitable and inclusive of the populations we serve, allowing us to… Read More »AI-driven predictive analytics for revenue forecasting in healthcare The post AI-driven predictive analytics for revenue forecasting in healthcare appeared first on Data Science Central.  ( 21 min )
    A new era of carrier connectivity: How technology is bridging the gap
    In the logistics and transportation industry, carrier connectivity has long been challenging, often riddled with inefficiencies and communication barriers. Innovative tools and platforms are revolutionizing how carriers connect with shippers and other stakeholders, fostering real-time collaboration and transparency.  This new era of carrier connectivity enhances the flow of information and redefines how the industry operates.… Read More »A new era of carrier connectivity: How technology is bridging the gap The post A new era of carrier connectivity: How technology is bridging the gap appeared first on Data Science Central.  ( 22 min )
  • Open

    Community Events
    Hi u/all am thinking of creating a series of interesting events for the broader AI-Community. I always felt most events are either too expensive or too much of a company pitch. The topics should vary each time but for sure cover technical deep dives, some high level talks and some possibility to network and get together. I already have some company sponsors who are just interested in showing their name in the venue but don't want to interact/get information on participants or anything. ​ What do you think - is this worth a shot? What ideas do you have wrt topics, speakers? submitted by /u/CarlCarter312 [link] [comments]  ( 9 min )
    Question on creation Neural Network to forecast construction duration considering delays.
    Data I have: Project Information (Cost, location, original contract duration) Different Reasons for extension of duration and corresponding number of days (actual record of different sample projects). ​ Based on this data, will the neural network make its own analysis of the frequency and severity of each Reason for extension of duration and incorporate it in the output? submitted by /u/shearhead [link] [comments]  ( 9 min )

  • Open

    Legions of DEF CON hackers will attack generative AI models
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Is it possible for a person to propose a new variety of neural networks?
    How to come up with the architecture? submitted by /u/AnyJello605 [link] [comments]  ( 8 min )
    Question about neural networks and games
    Hello everyone, I'll go straight to the question. My pixel game needs sprites, but I know practically nothing about drawing sprites, and if on the one hand I can still draw something, then 4 directional (front, back, left, right) turn out badly, and you also need to add animation to them. Knowing that there are a lot of neural networks and for almost every task, I decided to find a neural network that will solve my problem, but I couldn't. Therefore, I ask if you know a neural network that can solve my problem. Thank you in advance. submitted by /u/Ilya-33 [link] [comments]  ( 9 min )
  • Open

    [Discussion] Having trouble choosing which MLOps solution to transition to from using AWS Batch/S3/Dynamo + Tensorboard
    Overwhelmed choosing an MLOps/experiment tracking platform to move to from manual AWS/S3/Dynamo + local tensorboard Title says it all… So many choices and it’s hard to figure out which to go with. Small team, 2-3 people, data scientists/python devs. mostly training MLP regressors on datasets with 20k-100k samples (approx 100 input features, approx 30 output variables) and CNNs on datasets with 500k-1m samples (approx 10x10000 timeseries per input sample along with 50-ish input features and 4x10000 output timeseries). At the very least, upgrading our experiment tracking/results reporting from Tensorboard to a cloud platform is a big desire, however, I would also like to make our full pipeline a bit more documented/versioned, and possible simplify some of the architecture. Ideally, we c…  ( 10 min )
    [P] llama2.py
    Hi everyone, Here is a ported version of Andrej Karpathy's llama2.c into pure Python with zero dependencies. The nice thing is llama2.py essentially captures the inference logic from the Llama2 research paper. With this port in pure Python, it's so much easier to unpack the intricate concepts originally presented in a "scientific language". Designed for an extensive audience, it aims to be a straightforward "reference implementation" suitable for educational purposes.The original llama2.c repository comprises two Python files intended for model training and one C file for inference. The goal of pure Python implementation is to bridge the existing gap by offering a clear-cut reference implementation encapsulating all transformer logic within a concise Python file, not exceeding 500 lines of code. Though the original Meta/Llama is written on Python, its complexity is rather high due to multiple dependencies and sophisticated optimizations implemented within. This often makes it hard to follow, particularly for those new to the field. submitted by /u/Albatross9855 [link] [comments]  ( 9 min )
    [Project] open source flowchart for complex prompt techniques = useful?
    I've been using an abandoned block-coding framework called PyFlow to set up complex prompt chain techniques using LangChain. I found the visuals (and running multiple techniques at once) to be very helpful when comparing output quality. Would a beefed up open source version of this be useful for you and why? What are you currently doing to quickly test prompt chain techniques and measure the quality of the responses? https://preview.redd.it/ywpmjbwbvxhb1.png?width=1674&format=png&auto=webp&s=ca80a45e430b92bf31a9a9b9747c6d8ba7d3bde7 submitted by /u/copywriterpirate [link] [comments]  ( 9 min )
    Probem with running a python script that uses tflite [D]
    I have a python script on my raspberry pi zero that uses tflite to make predictions on images. it gets imported like that: import tflite_runtime.interpreter as tflite When i run the script from the terminal with sudo it gives me the error: ModuleNotFoundError: No module named 'tflite_runtime' without sudo it works just fine. Now the thing is that I need to run that script from another script using os.system("python code.py") but that gives the same error no matter if I add sudo or not. Do you have any idea hw to fix that problem? submitted by /u/Main-Associate-6457 [link] [comments]  ( 9 min )
    [D] Colab Pro no longer gives you a V100, not even a P100, you now pay for the (previously free) Tesla T4.
    submitted by /u/LumpySchool7262 [link] [comments]  ( 8 min )
    [R] Your Neural Network Doesn't Know What It Doesn't Know
    Hi everyone, I made a repo trying to collect every high-quality source for Out-of-distribution detection, ranging from articles and talks for beginners to research papers at top conferences. It also has a primer if you are not familiar with the topic. Check it out and give it a star to support me if you find it helpful. Thanks a lot ;) https://github.com/continuousml ​ https://preview.redd.it/gup7ckixhxhb1.png?width=868&format=png&auto=webp&s=e71f51bef0ff2b4f3f37e801702b5d365cbd67fd submitted by /u/Ok-Kaleidoscope-505 [link] [comments]  ( 9 min )
    [D] is there a Civitai-like service or resource for local install Tortoise_TTS files?
    Anything out there for this? submitted by /u/Duemellon [link] [comments]  ( 8 min )
    [P] Understanding Instruction Tuning for Multimodal LLMs
    submitted by /u/s_arme [link] [comments]  ( 8 min )
    Tips and tricks for publication of non-SOTA research? [Discussion]
    Say, I have a method that can be used as a drop-in replacement for a part of a network in a set of relevant tasks, for example, an improved CTC loss in speech recognition and non-autoregressive translation. I can run a bunch of experiments, applying my method in those tasks, showing that "model A + my method" works substantially better than just "model A." "Model A" is some relatively light and simple model, like Transformer-base. It might have been near-SOTA several years ago. However, the results won't be near contemporary SOTA, because SOTA models tend to be heavy and complicated. In theory, the modification can apply to bigger models, but I don't have the resources to re-implement them. So, how do I present my work? In theory, doing the "model A" experiments should be enough to show that my modification is interesting. But in practice most of the reviewers aren't very excited when the reported numbers are far from SOTA, even if the aim is not to beat SOTA, but to present a general working approach. I know that there are many papers that do not contain any SOTA-beating in them, but still manage to be successfully published. Question: how do they do that? Are there any tips and tricks? What kind of experiments/settings/datasets/models should I consider to convince the reviewers? Or at least make it harder for them to Reject-If-Not-SOTA? submitted by /u/Tomarchelone [link] [comments]  ( 9 min )
    [R] In search of big binaries
    I don't work in ML, but I work on an application that parses code and debugging information out of binaries (executables, shared libraries, etc.) that is then used by other people to build performance analysis tools that are used on ML applications. To do some performance benchmarking on my tool, I need some huge binaries to parse. Specifically, I need a binary with a lot of code (.text section in ELF) and debugging information (DWARF in ELF). I know there are lots of binaries that have a huge amount of data in them, but that doesn't help because my application just ignores that. Optimally, I need a binary with an on-disk size of greater than 5GB to get useful measurements, but greater than 1GB would still be good. I would also prefer being able to generate the binary from a source build so I can test it across the many architectures my tool supports. I have a collaborator who claims to have used my application on an 8GB TensorFlow binary. Unfortunately, that binary is on a classified system and I can't get access to it. However, that let's me know it's at least possible to make such a beast. Has anyone here seen such huge binaries using an ML application- it doesn't have to be TensorFlow? Thanks in advance! Tools I have tested thus far are the computational chemistry application NWChem and the quantum mechanics Gaussian integral application libint. However, I can't get either of these to make binaries larger than about 100MB. submitted by /u/OmegaNaughtEquals1 [link] [comments]  ( 9 min )
    [P] We built a multi-modal search app with Meta AI's ImageBind and Deep Lake (search with image, text, & audio)
    submitted by /u/davidbun [link] [comments]  ( 8 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] improve OCR accuracy in python
    Of course, I'd be happy to help you correct the errors in the text you provided. Here's the corrected version of your text: "Hello everyone, I'm currently working on a Python program in which I need to use an OCR library. However, the results weren't very satisfying even though the text in the image is clear and there is no noise (just text). This is an example of the images I've used:" https://preview.redd.it/flmdzz9otvhb1.png?width=515&format=png&auto=webp&s=dfd8ebab6f02369a150f7b9375e0d53baa6dceee https://preview.redd.it/q9mn21z9svhb1.png?width=515&format=png&auto=webp&s=291503b6a88c7c1dd2e639dd659d1d4b1a38e8d9 I tried changing the contrast, and the results seemed to improve. However, I'm wondering if there are other approaches to consider in Python. Additionally, I want to implement a cost function that takes into account both the execution time and the accuracy of the OCR. How can I do that? submitted by /u/Ordinary_Run_2513 [link] [comments]  ( 9 min )
    [R] Run LLama-2 13B, very fast, Locally on Low-Cost Intel ARC GPU
    submitted by /u/reps_up [link] [comments]  ( 8 min )
    [P] Rasa and Hugging Face. pre train model
    Hello. can anyone give me a hint or tell me where i can find more info to get started with my test AI? I have installed Rasa and have downloaded the files pytorch_model.bin and config.json from xlm_roberta_large from hugging face. have tried to put it in config.yml, (maybe I did something wrong) because when I run "rasa train" and "rasa shell -with the model" I only get the standard init "how are you?" which is located in the domain file. I want a slightly more natural conversation than this. submitted by /u/Professional-Push-94 [link] [comments]  ( 9 min )
    [P] How to install Kubeflow locally
    Hey, r/MachineLearning, it’s Nir from DagsHub 🐶 Our MLOps engineers have been experimenting with Kubeflow (”the MLOps version of Kubernetes”) for the past weeks to manage a training cluster on AWS. ​ https://preview.redd.it/dqi27ggq0vhb1.jpg?width=1194&format=pjpg&auto=webp&s=945c9bcaa9987d2df135e18cf081efd9f742cf0f As good engineers, we wanted to kick the tires a bit before committing cloud resources to it. Unfortunately, most installation guides we found online only showed you how to spin up a Kubeflow cluster on AWS, GCP, or Azure. We had a hard time finding clear instructions for installing it locally. A local installation would allow individuals to play around with it, experiment, and learn from it. So we started looking into it. We followed this thread until we successfully ran Kubeflow on a local machine. Since it wasn’t a super clear or easy process to figure out, we’ve written a blog post on how we did it and decided to share our insights with the ML community. You can find the blog here: https://dagshub.com/blog/how-to-install-kubeflow-locally We also have a follow-up blog that explains how to deploy a Kubeflow cluster on AWS, so if you have any insights or requests for further information - we’d love to hear from you! submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
    [P] Fact-checking framework for LLMs
    Hey guys, happy Sunday 🤗 I've started an open-source project for fact-checking in LLMs. Contributions are welcome: https://github.com/kortex-labs/korrect submitted by /u/kanxx030 [link] [comments]  ( 8 min )
    [P] Survey on data challenges when using ML/AI
    Hi all, I'm working on my master thesis on challenges related data acquisition and mgmt for companies using AI. If you fit the profile and can help it would be great. 2x50€ amazon vouchers draw at the end. Here is the link https://tummgmt.eu.qualtrics.com/jfe/form/SV_bl2FXTBrPe1Tn4q submitted by /u/g13e-reddit [link] [comments]  ( 9 min )
    [D]AAAI author list modification
    Can I add author to the author list between abstract deadline and submission deadline? AAAI-24 Submission Instructions says"Authors must enter the names of ALL AUTHORS at the time of registration (by abstract deadline) — CMT includes a hard-coded note that this is optional, however, this is a mandatory step for AAAI-24 authors. According to AAAI policy, all author names must be added at the time of abstract registration, and the list of names as well as the order in which they appear cannot be changed after August 15." I fell a little confusing that it says "the list of names ... cannot be changed after August 15", but also mentioned that "Authors must enter the names of ALL AUTHORS at the time of registration (by abstract deadline)". submitted by /u/No_Paramedic3606 [link] [comments]  ( 9 min )
    Explainable AI techniques for biologically inspired / plausible neural networks? [Discussion]
    submitted by /u/bluepapaya555 [link] [comments]  ( 9 min )
    [P] Llama2 Embeddings FastAPI Service
    submitted by /u/dicklesworth [link] [comments]  ( 8 min )
    [P] Use Llama2 to Improve the Accuracy of Tesseract OCR
    submitted by /u/dicklesworth [link] [comments]  ( 8 min )
  • Open

    Modeling walking and the legs
    Hello everybody, I had a question and looking for the experience from this group. I want to learn how to train models for walking. More specifically, I'm interested in looking at which muscles are active during walking. Can I use one subject for deep reinforcement learning or will I need more subjects to train the data (say 20+ subjects)? Thanks, submitted by /u/theslipguy [link] [comments]  ( 9 min )
    Stable Baselines PPO vs Ray.io PPO
    For context, I've been experimenting with different Reinforcement learning algorithms, frameworks etc. Currently I have a custom Gym environment with Stable baselines 3 to train a PPO agent. I've noticed however that this setup is very slow for training as stable Baselines doesn't properly utilize GPU + parallel environments don't really work that well due to "step lock" where it can't continue until everything is synced up. I've heard really good things about Ray/RLlib. It has more advance features + according to reports it's crazy fast for training. I want to basically rewrite my existing SB3 implementation into Ray but I don't find the documentation very user-friendly + there aren't really that much good/usefull tutorials online as far is I can find. I tried GPT-4 but it just regurgitates some very old Ray implementation. I even tried not using ray and trying Tensorflow again (I hate Tensorflow, I'm definitely team Pytorch 😬) Do you guys have any good tutorials, videos, documentation to get started with PPO on Ray? Would love to learn more, but documentation online seems very hardcore to start even 😂. Would love to know your suggestions. submitted by /u/ClassicAppropriate78 [link] [comments]  ( 9 min )
    terminated vs truncated in Gymnasium
    Hi! I am not completely sure how to use these flags from the Gymnasium API (I've always used the Gym API so far and I'm switching just now...). Particularly in the environment, I'm playing with now: It's a 1vs1 game, and an episode can end if one of the 2 players dies or a max. number of steps is reached. In this case: - The max. number of steps is a hard limit rule of the game, so if these are reached should they receive truncated or terminated signals? - If a player dies, should both agents get a terminated = True, or one should get terminated=True and the other one truncated=True, as it was still alive, and could have continued playing? ​ Thank you! ​ submitted by /u/xWh0am1 [link] [comments]  ( 9 min )
  • Open

    The possibilities of artificial intelligence, how it will shape your future, and whether it will have an impact at all?!
    submitted by /u/CoylyLard [link] [comments]  ( 8 min )
    Are there any AI LLM that are less restrictive in their answers, similar to ChatGPT on release?
    Trying to dip my toes into trying other LLMs but not truly not sure which are comparable to ChatGPT. Would love any suggestions, and maybe an explanation of why you chose that AI. submitted by /u/kokeda [link] [comments]  ( 8 min )
    Hacker exposes AI With “Terrible Math” to show defects in LLMs like GPT-4
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    Behind the Scenes of AI Sensationalism
    This article aims to uncover the truth behind the sensationalist AI news that's been dominating headlines. Much of what's touted as groundbreaking AI advancements can be thought as mere clickbait designed to captivate attention. Thoughts? submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 8 min )
    Video editing ai
    Hello, I'm currently editing videos using capcut, which is not ideal. I'm looking for an ai, that ideally : Finds me B-roll according to what I speak. Cuts "bad takes" out Good captions "TikTok style" Audio enhance. Do you guys know anything like this? Thank you! submitted by /u/Orlandostyler [link] [comments]  ( 8 min )
    One-Minute Daily AI News 8/13/2023
    The Federal Election Commission has begun a process to potentially regulate AI-generated deepfakes in political ads ahead of the 2024 election, a move advocates say would safeguard voters against a particularly insidious form of election disinformation.[1] Aug X released Augie, an AI-powered video creation platform incorporating a voice cloning feature to read ad copy without booking a recording studio.[2] Virtualitics, Inc., an artificial intelligence and data exploration company, today announced that it has raised $37 million in a Series C financing round led by Smith Point Capital, LLC with participation from Citi and advisory clients of The Hillman Company, among other investors.[3] AI-driven analytics platform Rasgo has announced the launch of Rasgo AI, a self-service analytics solution that integrates a GPT into enterprise data warehouse environments. The company said that with Rasgo AI, organizations can use the power of AI/GPT to accelerate insights and optimize recommended actions securely and efficiently.[4] Sources: [1] https://tulsaworld.com/news/nation-world/government-politics/fec-moves-toward-potentially-regulating-ai-deepfakes-in-campaign-ads/article_a9143257-512f-50b7-b6cb-53596fa81aeb.html [2] https://www.theverge.com/2023/8/10/23827676/ai-augx-voice-cloning-video-creator [3] https://www.prnewswire.com/news-releases/virtualitics-a-leader-in-artificial-intelligence-and-data-exploration-closes-37-million-series-c-funding-round-301897550.html [4] https://venturebeat.com/enterprise-analytics/rasgo-launches-rasgo-ai-generative-ai-agent-enterprise-data-warehouse-analytics/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    GPT-4 CAN'T REASON ಠ_ಠ ...apparently.
    submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    GitHub - jbpayton/llm-auto-forge: A langchain based tool to allow agents to dynamically create, use, store, and retrieve tools to solve real world problems
    submitted by /u/seraphius [link] [comments]  ( 8 min )
    One of the most interesting & hilarious interactions I’ve come across with Bing, I promise this will make you laugh & scratch your head 😂 🤔
    submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Discussion on current locally run GPT clones
    I see H20GPT and GPT4ALL both will run on your PC, but I have yet to find a comparison anywhere between the 2. Has anyone used these and have any comments, or opinions that they would like to share? Or if you know of another one, please share it. Thanks in Advance submitted by /u/buck_idaho [link] [comments]  ( 8 min )
    Is it ethical for OpenAi to avoid more controversial topics by forcing the model to remain neutral ?
    submitted by /u/JamesAibr [link] [comments]  ( 8 min )
    Less Capable ChatGPT Option
    I am parsing obituary text to gather age and survivors. ChatGPT does a wonderful job of doing this and returning this data in a json format. I am looking for something similar that I can use without a costly API expense. It would be even better if I can run it locally and interact with it via Python. I would welcome any recommendations or suggestions that you could offer. Thanks so much! submitted by /u/jcrowe [link] [comments]  ( 9 min )

  • Open

    AI Generated music. Haunting, horror inspired lyrics in the style of old school Linkin Park. A little rough around the edges because of time limits. lyrics by phind.com with some personal edits. Music and vocals: sono.ai
    submitted by /u/zvive [link] [comments]  ( 8 min )
    What free website has an Ai which I use that can turn Andrew huberman podcast YouTube videos into notes for free?
    Title. submitted by /u/Entire_Insurance_532 [link] [comments]  ( 8 min )
    The Neutering Paradox: Holding Back Models Hurts AGI Breakthroughs
    Even though AI companies might retain access to their non-neutered models, the process of neutering limits the availability of diverse and advanced models in the public domain: The Unspoken Challenge in Achieving True AGI Potential This is crucial because a significant portion of information and insights necessary for pushing AI advancements is derived from the analysis and research conducted on these neutered public models. As a result, neutering indirectly hinders the broader development of AGI by restricting the accessibility of vital learning resources within the AI community. submitted by /u/Powerful-Pumpkin-938 [link] [comments]  ( 9 min )
    REVENANT REBORN vs GENERAL GRIEVOUS | w/ AI Analysis
    AI Fight Breakdown of a Hypothetical Multi-VS Cyborg Showdown between Revenant from Apex Legends & General Grievous from Star Wars! This Video uses "AI Software" such as Chat GPT, Eleven Labs, D-ID, & Midjourney To simulate my "AI Co-Host" Cortana, The Arena, & the Fight Breakdown/Verdict. submitted by /u/AcanthisittaCheap914 [link] [comments]  ( 8 min )
    Just a curious question.
    Is there an AI writer that lets you use prompts with no prohibited content filters or restrictions, and is completely free? Just asking. submitted by /u/Laven-DXGN [link] [comments]  ( 8 min )
    Looking for an AI that learns an audio noise and can produce it in indefinite length
    As the title states, I’d like an AI that can learn the sound of, say, an electric fan powering on, running for awhile, and then turning off. Then, it can reproduce a sound of that fan with any runtime length. Some more examples would be running water, machinery, or human singing on one note. Does such an AI exist? submitted by /u/JaywrightCat [link] [comments]  ( 8 min )
    Will AI Be Able To "Revive" The Legends?
    submitted by /u/stefanbg92 [link] [comments]  ( 8 min )
    ISO help escaping domestic violence
    As the topic states, but as tldr as possible bc it’s so much and my [32f] brain is fucked from being in this situation for over 14 years. 10 years together, 4 years broken up. 2 kids, house, dogs. My youngest child [4 on Thursday] and I spend all of our time at home in my bedroom to avoid interactions. My oldest [12] does the same. My door no longer locks because he has forced the handle, broken the frame, broken the trim, you name it. I’m verbally abused just for existing. There is no correct response for me to make. Every interaction is formulated this way. But only where there are no outside witnesses. I’m a husk. I can no longer have normal interactions with people. Almost half of my life has been spent in close proximity to him. I’m constantly anxious bc idk when the next smear campa…  ( 11 min )
    AI Generative NPCs - Proof of Concept
    submitted by /u/Goatman117 [link] [comments]  ( 8 min )
    Sharks Stuck in a House for 90 Seconds
    submitted by /u/DPC_1 [link] [comments]  ( 8 min )
    Sharing 100 Objective Type Questions on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and Generative Models divided in 2 Online Exams (50 Questions each)
    Please provide your valuable feedback. CNN Objective Type Questions (50) RNN & Generative Models Objective Type Questions (50) submitted by /u/nkptcs [link] [comments]  ( 8 min )
    Looking for a google collab, (preferably that makes a gradio ui) that expands images, like generative fill from photoshop
    Anyone know where i can find such a thing? submitted by /u/bendyfan1111 [link] [comments]  ( 8 min )
    Comparing Wonder AI to DaVinci AI on the shape test. (DaVinci is more random & is definitely affected by the shape order in prompt, I posted a longer video of testing Wonder in this feed that I’ll link to in the comments..Wonder makes me go 🤔)
    I don’t know if there is something measurable but maybe there does seem to be concepts that wonder responds to haha from these tests I don’t think Wonder knows it’s a machine it seems to know what Alive is those…though maybe not, but it is strange that wonder will seem to choose a shape not based on the association of yes or no, or the order of the shape. If you’re not impressed by this test it’s because it’s showing mostly DaVinci demonstrating variables like order of shapes and yes and no affecting it in a way that wonder was not in the video I posted earlier. I have hours of footage with wonder I just started experimenting with DaVinci…with DaVinci it doesn’t feel like there is a ghost in the machine. Though if there is with Wonder it’s world model seems very narrow. I want to do more tests with DaVinci or try to figure out a concept that if an image generator was able to form a world model, a concept that might be likely to emerge across multiple models. Chances are it’s just other variables giving this affect but why not test and see if there is something to discover submitted by /u/Sonic_Improv [link] [comments]  ( 9 min )
    crying at this AI Twitter post
    it saw Stallion and drew a horse 😂😂😂 submitted by /u/__gozu_ [link] [comments]  ( 8 min )
  • Open

    [D] PDF link for 'Grokking the machine learning interview' course
    Can someone please provide the pdf download link for the Educative.io course - "Grokking the machine learning interview"? https://www.educative.io/courses/grokking-the-machine-learning-interview I am a student and can't really afford to buy their courses. submitted by /u/Sign-Itchy [link] [comments]  ( 8 min )
    [R] Jailbreak Prompts and LLM Safety
    The authors found two effective jailbreak prompts that can successfully jailbreak built-in safeguards of ChatGPT (GPT-3.5) and GPT-4. Paper: https://arxiv.org/abs/2308.03825 submitted by /u/titaniumstorm [link] [comments]  ( 8 min )
    [D] What Technologies Are Best for Building a Decentralized NLP Platform?
    We're working on a project at Deep Engine AI, focusing on decentralized NLP using blockchain and GPU training. What tools, libraries, or frameworks would you recommend for distributed computing, blockchain integration, and efficient GPU acceleration? Thanks for any insights! submitted by /u/deepengineai [link] [comments]  ( 9 min )
    [D]: Neural Network architecture for angle estimation of an electric meter
    I was thinking about building a hobby project with a microcontroller which runs a pre-trained neural network to estimate three angles from images of an electric meter I have at my home. My first step is to train a model on my computer with generated images and see how well this works in general and then later capture real images. To give you an idea of what I am looking for, I added a screenshot of the images I am currently generating. https://ibb.co/fFtRj1Q For this example image, I expect 35, 75, 137 degree as a result. What kind of network would you recommend for this task? Please keep in mind that it shouldn't be too fancy to still fit into a microcontroller via TensorFlow Lite. ​ Thank you so much for any recommendations submitted by /u/LM1117 [link] [comments]  ( 9 min )
    [P] Research Paper Highlights July-August 2023
    submitted by /u/seraschka [link] [comments]  ( 8 min )
    [D] Comparison of big CSPs vs small GPU clouds for fine-tuning LLMs
    Hi everyone, I am looking to fine-tune a Llama 2 (the 7B and 70B to see if there is a big difference), and I am looking at the different Cloud options for GPUs. There are of course the big cloud providers like AWS, and the smaller ones like Paperspace and co. I am trying to benchmark each in terms of price, ease of use, quick availability of GPUs, and feature-richness. Could you share the insights on big vs small cloud providers when training a LLM? If you have other criteria to make a decision I would be interested too! Thanks submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] What's the best way to prepare text data for text classification models?
    Specifically, I'm using Naive Bayes, Random Forrest, SVM and one deep learning model. I've tried to remove extra white space, remove things like [23f] (data from reddit posts), urls etc. I also have 2 datasets: one with original letters and one with only small ones. But is there a better way than just doing it by hand? Any libraries? submitted by /u/eeriek [link] [comments]  ( 9 min )
    [P] Question about object detection
    Hi, I'm new to machine learning and have a question regarding object detection. Here's the scenario: I have an image, let's call it image 1, where a person is captured from the front. This allows me to see the person's face, clothes, shoes, and basically their entire front view. I have another image, image 2, taken from a different angle or perspective (e.g., their back view). In this image, I might be able to see the entire person or just a part of them. The challenge I'm facing is: Can I predict if the person in image 1 is present in image 2? If this is possible, I'd appreciate any guidance on how to approach this problem: What methodologies or algorithms should I consider? What kind of datasets might be useful for this task? Any resources, tools, or tutorials that can help me get started? Thank you in advance for any insights or guidance you can provide! submitted by /u/Senior_Box_8288 [link] [comments]  ( 9 min )
    [P] Skyline v2.0 Equation by rainmanp7
    This is my go at my own machine reinforced training equation. This is my updated version from 1.1 it's now at 2.0. Hopefully you can learn some things from looking at it. You're welcome to comment on anything you see. The concept of leveraging similarities and adaptive learning: Skyline v2.0 Equation by rainmanp7. Date of Completion 08/08/2023 1:20pm QuantumAI for Reinforced Machine Learning. Additional information added 11:16am 08/12/2023 with more details. wi = (wi0 / (1 + (vector_dij / τ))) * (1 + α * Ps + β * T + γ * M + δ * V + ε * MA + ζ * C + η * S + θ * Si + φ * Td_i + _cache[(wi0, dij, τ, learning_method_coefficients, complexity_factor, object_properties, position_relative_to_center)] + complexity_factor * (multithreaded_vector_pipeline(vector_data, T1, T2, ...) | pipeline | m…  ( 11 min )
    [P] Semantic Search using Chatbot
    So basically what I need to do is build a chatbot that is able to identify user intents and 1) if the user is seeking information then perform semantic search to generate a response 2) if the user is seeking to perform some action (say, schedule an appointment) then collate all the information and push it to a database for appointments How do I build the chatbot such that it can identify different intents and either do 1) or 2)? What tools/technologies can I use? submitted by /u/hellohibyebye13 [link] [comments]  ( 9 min )
    [P] 🎓 How our AI junior dev reads all of your documentation
    submitted by /u/williamsweep [link] [comments]  ( 8 min )
    [P] Allowing Hugging Face's TextClassificationPipeline to take documents longer than model max length
    I recently made a proposed code change to allow Hugging Face's TextClassificationPipeline to take advantage of the sliding window-style text truncation provided by using the stride parameter, and taking a mean of output logits across all windows. Hugging Face has already implemented this for the TokenClassificationPipeline. E.g. if you want to use a Hugging Face-compatible model to run sentiment analysis on text, this would allow easily running that model on texts longer than the model's config.max_position_embeddings. If you support integrating this functionality into the "transformers" library, give a thumbs-up react to this comment on the relevant issue. submitted by /u/Revolutionary-Ad-65 [link] [comments]  ( 9 min )
    [R] Incorrect TensorFlow Prediction For Apple M1 Max
    Hi, Unfortunately I’m unable to ask my questions on TensorFlow subreddit. I have installed MacOS TensorFlow and I have noticed that when I try to train on datasets such a as CelebFaces and Lego set with GPU I’m getting results that are very off. I have done some brief research and that seems to be happening for some other people I’m wondering if anyone has experience resolving the issue. Any advise or feedback is much appreciated. Thank you submitted by /u/Nuclearian [link] [comments]  ( 9 min )
    [R] Is it possible to work on a research project in a uni for 6 months or a year?
    I am a full time ML engineer with a masters. I am keen on working on problems in depth and feel like taking a break and working on some ML research problems but I don’t want to go for a PhD(Don’t want to go through course work). Are there any programs offered by universities for working professionals to get research experience for a shorter window like 6 months or a year? submitted by /u/Brave-Revolution4441 [link] [comments]  ( 9 min )
    [D] Why isn't Population Based Training used anymore?
    Been looking into training some large transformer models for vision applications, and am really interested to know why PBT isn't used anymore. Keeping compute constant, PBT appears to drastically improve optimization across the board at the cost of one or more of batch size/training steps/model complexity/other compute consuming factors. If the goal is to minimize validation loss as quickly as possible, isn't this tradeoff worth it? submitted by /u/clywac2 [link] [comments]  ( 9 min )
    [D] Thoughts on Jon Krohns Machine Learning Mathematical Foundations
    Context: I'm teaching myself machine learning and right now I'm starting on the very core of it which is mathematics. For those who bought this course from Udemy, is this enough for real life ML problems? submitted by /u/Forsaken_Buy_7531 [link] [comments]  ( 9 min )
  • Open

    which is the recommended physics engine for deep reinforced learning?
    I am thinking of a project that will use some constraints of the physical world and then use deep reinforced learning on it. Is there any physics engine that you'll could recommend me. I came across Mujoco but the documentation is hard to understand and there are not many resources on it to learn. Any suggestion on what I could use? ​ submitted by /u/rakk109 [link] [comments]  ( 9 min )
    PPO Tensorboard loss functions
    I'm training a PPO algorithm using stable baseline for some stock data, and I want to know if the model is learning properly, or i should tweak some hyperparameters or increase time steps. I'm new to reinforcement learning, but in deep learning, the loss should decrease as a good sign of converging and learning, which is the case for the entropy loss in the picture attached, but I don't understand the difference between the other losses. https://preview.redd.it/7ovw2gf8iohb1.png?width=1656&format=png&auto=webp&s=09fbb112a562fad294f88c8f3d94904bdad95759 submitted by /u/Acceptable_Egg6552 [link] [comments]  ( 9 min )
  • Open

    Simple way to distribute points on a sphere
    Evenly placing points on a sphere is a difficult problem. It’s impossible in general, and so you distribute the points as evenly as you can. The results vary according to how you measure how evenly the points are spread. However, there is a fast and simple way to distribute points that may be good enough, […] Simple way to distribute points on a sphere first appeared on John D. Cook.  ( 5 min )
    Spherical coordinate Rosetta Stone
    If you’ve only seen one definition of spherical coordinates, you may be shocked to discover that there are multiple conventions. In particular, mathematicians and geoscientists have different conventions. As Volker Michel put it in book on constructive approximation, Many mathematicians have faced weird jigsaw puzzles with misplaced continents after using a data set from a […] Spherical coordinate Rosetta Stone first appeared on John D. Cook.  ( 7 min )
  • Open

    🚀 Unleash the Future of AI with MetaGPT! 🌟
    submitted by /u/ABDULKADER90H [link] [comments]  ( 8 min )
    Sharing 100 Objective Type Questions on Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) and Generative Models divided in 2 Online Exams (50 Questions each)
    Please provide your valuable feedback. CNN Objective Type Questions (50) RNN & Generative Models Objective Type Questions (50) submitted by /u/nkptcs [link] [comments]  ( 8 min )
    Awesome Out-of-distribution Detection
    Hi everyone, I have put together a repo that provides comprehensive resources for Out-of-distribution Detection, Robustness, and Generalization. The repo contains articles, talks, libraries, papers, etc. Unlike many repos, this one will actually be maintained and updated with high-quality sources! I hope it becomes a one-stop shop for anything OOD in your bookmark. Give it a star if you find it helpful ;) Check it out. https://github.com/continuousml/Awesome-Out-Of-Distribution-Detection ​ https://preview.redd.it/s5bpdelb3lhb1.png?width=895&format=png&auto=webp&s=b1b123c709113c30b20c2f4f0ebeb995f79edf50 submitted by /u/Ok-Kaleidoscope-505 [link] [comments]  ( 8 min )
    What's the current state/consensus on using neural networks for solving combinatorial scheduling problems?
    Historically, the most practical methods for solving real-world combinatorial scheduling problems have been using heuristics or metaheurisics such as simulated annealing, tabu search, greedy randomized adaptive search, etc... I consider these more operation research-based techniques. However, recently we have obviously seen a lot of progress being made in the machine learning realm for many types of problems. In particular, we've seen neural networks be used to train models based on data in text, audio, or video form. I am wondering if we have any idea what the scientific consensus is toward applying these same sort of methods for scheduling problems. Suppose we have a history of schedules that we could train a model on. A schedule isn't really text, audio, or video so I don't understand how one could embed the information in a vector space in the same way that would accurately represent the information (specifically, constraints so that the resulting schedule is still feasible) Is there anyone doing research in this particular area? submitted by /u/nick898 [link] [comments]  ( 9 min )
  • Open

    Enhancing Nucleus Segmentation with HARU-Net: A Hybrid Attention Based Residual U-Blocks Network. (arXiv:2308.03382v2 [eess.IV] UPDATED)
    Nucleus image segmentation is a crucial step in the analysis, pathological diagnosis, and classification, which heavily relies on the quality of nucleus segmentation. However, the complexity of issues such as variations in nucleus size, blurred nucleus contours, uneven staining, cell clustering, and overlapping cells poses significant challenges. Current methods for nucleus segmentation primarily rely on nuclear morphology or contour-based approaches. Nuclear morphology-based methods exhibit limited generalization ability and struggle to effectively predict irregular-shaped nuclei, while contour-based extraction methods face challenges in accurately segmenting overlapping nuclei. To address the aforementioned issues, we propose a dual-branch network using hybrid attention based residual U-blocks for nucleus instance segmentation. The network simultaneously predicts target information and target contours. Additionally, we introduce a post-processing method that combines the target information and target contours to distinguish overlapping nuclei and generate an instance segmentation image. Within the network, we propose a context fusion block (CF-block) that effectively extracts and merges contextual information from the network. Extensive quantitative evaluations are conducted to assess the performance of our method. Experimental results demonstrate the superior performance of the proposed method compared to state-of-the-art approaches on the BNS, MoNuSeg, CoNSeg, and CPM-17 datasets.
    Multi-source adversarial transfer learning for ultrasound image segmentation with limited similarity. (arXiv:2305.19069v1 [eess.IV] CROSS LISTED)
    Lesion segmentation of ultrasound medical images based on deep learning techniques is a widely used method for diagnosing diseases. Although there is a large amount of ultrasound image data in medical centers and other places, labeled ultrasound datasets are a scarce resource, and it is likely that no datasets are available for new tissues/organs. Transfer learning provides the possibility to solve this problem, but there are too many features in natural images that are not related to the target domain. As a source domain, redundant features that are not conducive to the task will be extracted. Migration between ultrasound images can avoid this problem, but there are few types of public datasets, and it is difficult to find sufficiently similar source domains. Compared with natural images, ultrasound images have less information, and there are fewer transferable features between different ultrasound images, which may cause negative transfer. To this end, a multi-source adversarial transfer learning network for ultrasound image segmentation is proposed. Specifically, to address the lack of annotations, the idea of adversarial transfer learning is used to adaptively extract common features between a certain pair of source and target domains, which provides the possibility to utilize unlabeled ultrasound data. To alleviate the lack of knowledge in a single source domain, multi-source transfer learning is adopted to fuse knowledge from multiple source domains. In order to ensure the effectiveness of the fusion and maximize the use of precious data, a multi-source domain independent strategy is also proposed to improve the estimation of the target domain data distribution, which further increases the learning ability of the multi-source adversarial migration learning network in multiple domains.
    Scaling may be all you need for achieving human-level object recognition capacity with human-like visual experience. (arXiv:2308.03712v2 [cs.CV] UPDATED)
    This paper asks whether current self-supervised learning methods, if sufficiently scaled up, would be able to reach human-level visual object recognition capabilities with the same type and amount of visual experience humans learn from. Previous work on this question only considered the scaling of data size. Here, we consider the simultaneous scaling of data size, model size, and image resolution. We perform a scaling experiment with vision transformers up to 633M parameters in size (ViT-H/14) trained with up to 5K hours of human-like video data (long, continuous, mostly egocentric videos) with image resolutions of up to 476x476 pixels. The efficiency of masked autoencoders (MAEs) as a self-supervised learning algorithm makes it possible to run this scaling experiment on an unassuming academic budget. We find that it is feasible to reach human-level object recognition capacity at sub-human scales of model size, data size, and image size, if these factors are scaled up simultaneously. To give a concrete example, we estimate that a 2.5B parameter ViT model trained with 20K hours (2.3 years) of human-like video data with a spatial resolution of 952x952 pixels should be able to reach roughly human-level accuracy on ImageNet. Human-level competence is thus achievable for a fundamental perceptual capability from human-like perceptual experience (human-like in both amount and type) with extremely generic learning algorithms and architectures and without any substantive inductive biases.
    Multi-Class Deep SVDD: Anomaly Detection Approach in Astronomy with Distinct Inlier Categories. (arXiv:2308.05011v2 [cs.LG] UPDATED)
    With the increasing volume of astronomical data generated by modern survey telescopes, automated pipelines and machine learning techniques have become crucial for analyzing and extracting knowledge from these datasets. Anomaly detection, i.e. the task of identifying irregular or unexpected patterns in the data, is a complex challenge in astronomy. In this paper, we propose Multi-Class Deep Support Vector Data Description (MCDSVDD), an extension of the state-of-the-art anomaly detection algorithm One-Class Deep SVDD, specifically designed to handle different inlier categories with distinct data distributions. MCDSVDD uses a neural network to map the data into hyperspheres, where each hypersphere represents a specific inlier category. The distance of each sample from the centers of these hyperspheres determines the anomaly score. We evaluate the effectiveness of MCDSVDD by comparing its performance with several anomaly detection algorithms on a large dataset of astronomical light-curves obtained from the Zwicky Transient Facility. Our results demonstrate the efficacy of MCDSVDD in detecting anomalous sources while leveraging the presence of different inlier categories. The code and the data needed to reproduce our results are publicly available at https://github.com/mperezcarrasco/AnomalyALeRCE.
    Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling. (arXiv:2307.07944v2 [cs.CV] UPDATED)
    Unsupervised domain adaptation (DA) with the aid of pseudo labeling techniques has emerged as a crucial approach for domain-adaptive 3D object detection. While effective, existing DA methods suffer from a substantial drop in performance when applied to a multi-class training setting, due to the co-existence of low-quality pseudo labels and class imbalance issues. In this paper, we address this challenge by proposing a novel ReDB framework tailored for learning to detect all classes at once. Our approach produces Reliable, Diverse, and class-Balanced pseudo 3D boxes to iteratively guide the self-training on a distributionally different target domain. To alleviate disruptions caused by the environmental discrepancy (e.g., beam numbers), the proposed cross-domain examination (CDE) assesses the correctness of pseudo labels by copy-pasting target instances into a source environment and measuring the prediction consistency. To reduce computational overhead and mitigate the object shift (e.g., scales and point densities), we design an overlapped boxes counting (OBC) metric that allows to uniformly downsample pseudo-labeled objects across different geometric characteristics. To confront the issue of inter-class imbalance, we progressively augment the target point clouds with a class-balanced set of pseudo-labeled target instances and source objects, which boosts recognition accuracies on both frequently appearing and rare classes. Experimental results on three benchmark datasets using both voxel-based (i.e., SECOND) and point-based 3D detectors (i.e., PointRCNN) demonstrate that our proposed ReDB approach outperforms existing 3D domain adaptation methods by a large margin, improving 23.15% mAP on the nuScenes $\rightarrow$ KITTI task. The code is available at https://github.com/zhuoxiao-chen/ReDB-DA-3Ddet.
    A Feature Set of Small Size for the PDF Malware Detection. (arXiv:2308.04704v2 [cs.CR] UPDATED)
    Machine learning (ML)-based malware detection systems are becoming increasingly important as malware threats increase and get more sophisticated. PDF files are often used as vectors for phishing attacks because they are widely regarded as trustworthy data resources, and are accessible across different platforms. Therefore, researchers have developed many different PDF malware detection methods. Performance in detecting PDF malware is greatly influenced by feature selection. In this research, we propose a small features set that don't require too much domain knowledge of the PDF file. We evaluate proposed features with six different machine learning models. We report the best accuracy of 99.75% when using Random Forest model. Our proposed feature set, which consists of just 12 features, is one of the most conciseness in the field of PDF malware detection. Despite its modest size, we obtain comparable results to state-of-the-art that employ a much larger set of features.
    {\Pi}-ML: A dimensional analysis-based machine learning parameterization of optical turbulence in the atmospheric surface layer. (arXiv:2304.12177v2 [physics.ao-ph] UPDATED)
    Turbulent fluctuations of the atmospheric refraction index, so-called optical turbulence, can significantly distort propagating laser beams. Therefore, modeling the strength of these fluctuations ($C_n^2$) is highly relevant for the successful development and deployment of future free-space optical communication links. In this letter, we propose a physics-informed machine learning (ML) methodology, $\Pi$-ML, based on dimensional analysis and gradient boosting to estimate $C_n^2$. Through a systematic feature importance analysis, we identify the normalized variance of potential temperature as the dominating feature for predicting $C_n^2$. For statistical robustness, we train an ensemble of models which yields high performance on the out-of-sample data of $R^2=0.958\pm0.001$.
    Conditional Generative Models for Learning Stochastic Processes. (arXiv:2304.10382v4 [quant-ph] UPDATED)
    A framework to learn a multi-modal distribution is proposed, denoted as the Conditional Quantum Generative Adversarial Network (C-qGAN). The neural network structure is strictly within a quantum circuit and, as a consequence, is shown to represent a more efficient state preparation procedure than current methods. This methodology has the potential to speed-up algorithms, such as Monte Carlo analysis. In particular, after demonstrating the effectiveness of the network in the learning task, the technique is applied to price Asian option derivatives, providing the foundation for further research on other path-dependent options.
    Autonomous sputter synthesis of thin film nitrides with composition controlled by Bayesian optimization of optical plasma emission. (arXiv:2305.11122v3 [physics.app-ph] UPDATED)
    Autonomous experimentation has emerged as an efficient approach to accelerate the pace of materials discovery. Although instruments for autonomous synthesis have become popular in molecular and polymer science, solution processing of hybrid materials and nanoparticles, examples of autonomous tools for physical vapor deposition are scarce yet important for the semiconductor industry. Here, we report the design and implementation of an autonomous workflow for sputter deposition of thin films with controlled composition, leveraging a highly automated sputtering reactor custom-controlled by Python, optical emission spectroscopy (OES), and a Bayesian optimization algorithm. We modeled film composition, measured by x-ray fluorescence, as a linear function of emission lines monitored during the co-sputtering from elemental Zn and Ti targets in N$_2$ atmosphere. A Bayesian control algorithm, informed by OES, navigates the space of sputtering power to fabricate films with user-defined composition, by minimizing the absolute error between desired and measured emission signals. We validated our approach by autonomously fabricating Zn$_x$Ti$_{1-x}$N$_y$ films with deviations from the targeted cation composition within relative 3.5 %, even for 15 nm thin films, demonstrating that the proposed approach can reliably synthesize thin films with specific composition and minimal human interference. Moreover, the proposed method can be extended to more difficult synthesis experiments where plasma intensity depends non-linearly on pressure, or the elemental sticking coefficients strongly depend on the substrate temperature.
    Progressive-Hint Prompting Improves Reasoning in Large Language Models. (arXiv:2304.09797v5 [cs.CL] UPDATED)
    The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% -> 91.9%), GSM8K (92% -> 95.5%), AQuA (76.4% -> 79.9%) and MATH (50.3% -> 53.9%).
    Incremental Profit per Conversion: a Response Transformation for Uplift Modeling in E-Commerce Promotions. (arXiv:2306.13759v2 [cs.LG] UPDATED)
    Promotions play a crucial role in e-commerce platforms, and various cost structures are employed to drive user engagement. This paper focuses on promotions with response-dependent costs, where expenses are incurred only when a purchase is made. Such promotions include discounts and coupons. While existing uplift model approaches aim to address this challenge, these approaches often necessitate training multiple models, like meta-learners, or encounter complications when estimating profit due to zero-inflated values stemming from non-converted individuals with zero cost and profit. To address these challenges, we introduce Incremental Profit per Conversion (IPC), a novel uplift measure of promotional campaigns' efficiency in unit economics. Through a proposed response transformation, we demonstrate that IPC requires only converted data, its propensity, and a single model to be estimated. As a result, IPC resolves the issues mentioned above while mitigating the noise typically associated with the class imbalance in conversion datasets and biases arising from the many-to-one mapping between search and purchase data. Lastly, we validate the efficacy of our approach by presenting results obtained from a synthetic simulation of a discount coupon campaign.
    From Random Search to Bandit Learning in Metric Measure Spaces. (arXiv:2305.11509v4 [cs.LG] UPDATED)
    Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also endowed with a probability measure, and show that BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance.
    The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter. (arXiv:2306.03805v2 [cs.LG] UPDATED)
    Large pre-trained transformers are show-stealer in modern-day deep learning, and it becomes crucial to comprehend the parsimonious patterns that exist within them as they grow in scale. With exploding parameter counts, Lottery Ticket Hypothesis (LTH) and its variants, have lost their pragmatism in sparsifying them due to high computation and memory bottleneck of repetitive train-prune-retrain routine of iterative magnitude pruning (IMP) which worsens with increasing model size. This paper comprehensively studies induced sparse patterns across multiple large pre-trained vision and language transformers. We propose the existence of -- essential sparsity defined with a sharp dropping point beyond which the performance declines much faster w.r.t the rise of sparsity level, when we directly remove weights with the smallest magnitudes in one-shot without re-training. We also find essential sparsity to hold valid for N:M sparsity patterns as well as on modern-scale large language models (Vicuna-7B). We also present an intriguing emerging phenomenon of abrupt sparsification during the pre-training of BERT, i.e., BERT suddenly becomes heavily sparse in pre-training after certain iterations. Moreover, our observations also indicate a counter-intuitive finding that BERT trained with a larger amount of pre-training data tends to have a better ability to condense knowledge in comparatively relatively fewer parameters. Lastly, we investigate the effect of the pre-training loss on essential sparsity and discover that self-supervised learning (SSL) objectives trigger stronger emergent sparsification properties than supervised learning (SL). Our codes are available at \url{https://github.com/VITA-Group/essential_sparsity}.
    Symmetry Defense Against CNN Adversarial Perturbation Attacks. (arXiv:2210.04087v3 [cs.LG] UPDATED)
    This paper uses symmetry to make Convolutional Neural Network classifiers (CNNs) robust against adversarial perturbation attacks. Such attacks add perturbation to original images to generate adversarial images that fool classifiers such as road sign classifiers of autonomous vehicles. Although symmetry is a pervasive aspect of the natural world, CNNs are unable to handle symmetry well. For example, a CNN can classify an image differently from its mirror image. For an adversarial image that misclassifies with a wrong label $l_w$, CNN inability to handle symmetry means that a symmetric adversarial image can classify differently from the wrong label $l_w$. Further than that, we find that the classification of a symmetric adversarial image reverts to the correct label. To classify an image when adversaries are unaware of the defense, we apply symmetry to the image and use the classification label of the symmetric image. To classify an image when adversaries are aware of the defense, we use mirror symmetry and pixel inversion symmetry to form a symmetry group. We apply all the group symmetries to the image and decide on the output label based on the agreement of any two of the classification labels of the symmetry images. Adaptive attacks fail because they need to rely on loss functions that use conflicting CNN output values for symmetric images. Without attack knowledge, the proposed symmetry defense succeeds against both gradient-based and random-search attacks, with up to near-default accuracies for ImageNet. The defense even improves the classification accuracy of original images.
    Product Review Image Ranking for Fashion E-commerce. (arXiv:2308.05390v1 [cs.CV])
    In a fashion e-commerce platform where customers can't physically examine the products on their own, being able to see other customers' text and image reviews of the product is critical while making purchase decisions. Given the high reliance on these reviews, over the years we have observed customers proactively sharing their reviews. With an increase in the coverage of User Generated Content (UGC), there has been a corresponding increase in the number of customer images. It is thus imperative to display the most relevant images on top as it may influence users' online shopping choices and behavior. In this paper, we propose a simple yet effective training procedure for ranking customer images. We created a dataset consisting of Myntra (A Major Indian Fashion e-commerce company) studio posts and highly engaged (upvotes/downvotes) UGC images as our starting point and used selected distortion techniques on the images of the above dataset to bring their quality at par with those of bad UGC images. We train our network to rank bad-quality images lower than high-quality ones. Our proposed method outperforms the baseline models on two metrics, namely correlation coefficient, and accuracy, by substantial margins.  ( 2 min )
    A survey of some recent developments in measures of association. (arXiv:2211.04702v2 [stat.ME] UPDATED)
    This paper surveys some recent developments in measures of association related to a new coefficient of correlation introduced by the author. A straightforward extension of this coefficient to standard Borel spaces (which includes all Polish spaces), overlooked in the literature so far, is proposed at the end of the survey.
    RobustPdM: Designing Robust Predictive Maintenance against Adversarial Attacks. (arXiv:2301.10822v2 [cs.CR] UPDATED)
    The state-of-the-art predictive maintenance (PdM) techniques have shown great success in reducing maintenance costs and downtime of complicated machines while increasing overall productivity through extensive utilization of Internet-of-Things (IoT) and Deep Learning (DL). Unfortunately, IoT sensors and DL algorithms are both prone to cyber-attacks. For instance, DL algorithms are known for their susceptibility to adversarial examples. Such adversarial attacks are vastly under-explored in the PdM domain. This is because the adversarial attacks in the computer vision domain for classification tasks cannot be directly applied to the PdM domain for multivariate time series (MTS) regression tasks. In this work, we propose an end-to-end methodology to design adversarially robust PdM systems by extensively analyzing the effect of different types of adversarial attacks and proposing a novel adversarial defense technique for DL-enabled PdM models. First, we propose novel MTS Projected Gradient Descent (PGD) and MTS PGD with random restarts (PGD_r) attacks. Then, we evaluate the impact of MTS PGD and PGD_r along with MTS Fast Gradient Sign Method (FGSM) and MTS Basic Iterative Method (BIM) on Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), and Bi-directional LSTM based PdM system. Our results using NASA's turbofan engine dataset show that adversarial attacks can cause a severe defect (up to 11X) in the RUL prediction, outperforming the effectiveness of the state-of-the-art PdM attacks by 3X. Furthermore, we present a novel approximate adversarial training method to defend against adversarial attacks. We observe that approximate adversarial training can significantly improve the robustness of PdM models (up to 54X) and outperforms the state-of-the-art PdM defense methods by offering 3X more robustness.
    Width and Depth Limits Commute in Residual Networks. (arXiv:2302.00453v2 [stat.ML] UPDATED)
    We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width. We also demonstrate that the pre-activations, in this case, have Gaussian distributions which has direct applications in Bayesian deep learning. We conduct extensive simulations that show an excellent match with our theoretical findings.
    Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis Testing: A Lesson From Fano. (arXiv:2210.13662v2 [cs.LG] UPDATED)
    Differential privacy (DP) is by far the most widely accepted framework for mitigating privacy risks in machine learning. However, exactly how small the privacy parameter $\epsilon$ needs to be to protect against certain privacy risks in practice is still not well-understood. In this work, we study data reconstruction attacks for discrete data and analyze it under the framework of multiple hypothesis testing. We utilize different variants of the celebrated Fano's inequality to derive upper bounds on the inferential power of a data reconstruction adversary when the model is trained differentially privately. Importantly, we show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $\epsilon$ can be $O(\log M)$ before the adversary gains significant inferential power. Our analysis offers theoretical evidence for the empirical effectiveness of DP against data reconstruction attacks even at relatively large values of $\epsilon$.
    Adaptive Gated Graph Convolutional Network for Explainable Diagnosis of Alzheimer's Disease using EEG Data. (arXiv:2304.05874v2 [q-bio.NC] UPDATED)
    Graph neural network (GNN) models are increasingly being used for the classification of electroencephalography (EEG) data. However, GNN-based diagnosis of neurological disorders, such as Alzheimer's disease (AD), remains a relatively unexplored area of research. Previous studies have relied on functional connectivity methods to infer brain graph structures and used simple GNN architectures for the diagnosis of AD. In this work, we propose a novel adaptive gated graph convolutional network (AGGCN) that can provide explainable predictions. AGGCN adaptively learns graph structures by combining convolution-based node feature enhancement with a well-known correlation-based measure of functional connectivity. Furthermore, the gated graph convolution can dynamically weigh the contribution of various spatial scales. The proposed model achieves high accuracy in both eyes-closed and eyes-open conditions, indicating the stability of learned representations. Finally, we demonstrate that the proposed AGGCN model generates consistent explanations of its predictions that might be relevant for further study of AD-related alterations of brain networks.
    InfoNCE is variational inference in a recognition parameterised model. (arXiv:2107.02495v3 [stat.ML] UPDATED)
    Here, we show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model, the recognition parameterised model (RPM). When we learn the optimal prior, the RPM ELBO becomes equal to the mutual information (MI; up to a constant), establishing a connection to pre-existing self-supervised learning methods such as InfoNCE. However, practical InfoNCE methods do not use the MI as an objective; the MI is invariant to arbitrary invertible transformations, so using an MI objective can lead to highly entangled representations (Tschannen et al., 2019). Instead, the actual InfoNCE objective is a simplified lower bound on the MI which is loose even in the infinite sample limit. Thus, an objective that works (i.e. the actual InfoNCE objective) appears to be motivated as a loose bound on an objective that does not work (i.e. the true MI which gives arbitrarily entangled representations). We give an alternative motivation for the actual InfoNCE objective. In particular, we show that in the infinite sample limit, and for a particular choice of prior, the actual InfoNCE objective is equal to the ELBO (up to a constant); and the ELBO is equal to the marginal likelihood with a deterministic recognition model. Thus, we argue that our VAE perspective gives a better motivation for InfoNCE than MI, as the actual InfoNCE objective is only loosely bounded by the MI, but is equal to the ELBO/marginal likelihood (up to a constant).
    Improving Image-Based Precision Medicine with Uncertainty-Aware Causal Models. (arXiv:2305.03829v4 [cs.LG] UPDATED)
    Image-based precision medicine aims to personalize treatment decisions based on an individual's unique imaging features so as to improve their clinical outcome. Machine learning frameworks that integrate uncertainty estimation as part of their treatment recommendations would be safer and more reliable. However, little work has been done in adapting uncertainty estimation techniques and validation metrics for precision medicine. In this paper, we use Bayesian deep learning for estimating the posterior distribution over factual and counterfactual outcomes on several treatments. This allows for estimating the uncertainty for each treatment option and for the individual treatment effects (ITE) between any two treatments. We train and evaluate this model to predict future new and enlarging T2 lesion counts on a large, multi-center dataset of MR brain images of patients with multiple sclerosis, exposed to several treatments during randomized controlled trials. We evaluate the correlation of the uncertainty estimate with the factual error, and, given the lack of ground truth counterfactual outcomes, demonstrate how uncertainty for the ITE prediction relates to bounds on the ITE error. Lastly, we demonstrate how knowledge of uncertainty could modify clinical decision-making to improve individual patient and clinical trial outcomes.
    From CNN to Transformer: A Review of Medical Image Segmentation Models. (arXiv:2308.05305v1 [eess.IV])
    Medical image segmentation is an important step in medical image analysis, especially as a crucial prerequisite for efficient disease diagnosis and treatment. The use of deep learning for image segmentation has become a prevalent trend. The widely adopted approach currently is U-Net and its variants. Additionally, with the remarkable success of pre-trained models in natural language processing tasks, transformer-based models like TransUNet have achieved desirable performance on multiple medical image segmentation datasets. In this paper, we conduct a survey of the most representative four medical image segmentation models in recent years. We theoretically analyze the characteristics of these models and quantitatively evaluate their performance on two benchmark datasets (i.e., Tuberculosis Chest X-rays and ovarian tumors). Finally, we discuss the main challenges and future trends in medical image segmentation. Our work can assist researchers in the related field to quickly establish medical segmentation models tailored to specific regions.  ( 2 min )
    Quality Diversity under Sparse Reward and Sparse Interaction: Application to Grasping in Robotics. (arXiv:2308.05483v1 [cs.RO])
    Quality-Diversity (QD) methods are algorithms that aim to generate a set of diverse and high-performing solutions to a given problem. Originally developed for evolutionary robotics, most QD studies are conducted on a limited set of domains - mainly applied to locomotion, where the fitness and the behavior signal are dense. Grasping is a crucial task for manipulation in robotics. Despite the efforts of many research communities, this task is yet to be solved. Grasping cumulates unprecedented challenges in QD literature: it suffers from reward sparsity, behavioral sparsity, and behavior space misalignment. The present work studies how QD can address grasping. Experiments have been conducted on 15 different methods on 10 grasping domains, corresponding to 2 different robot-gripper setups and 5 standard objects. An evaluation framework that distinguishes the evaluation of an algorithm from its internal components has also been proposed for a fair comparison. The obtained results show that MAP-Elites variants that select successful solutions in priority outperform all the compared methods on the studied metrics by a large margin. We also found experimental evidence that sparse interaction can lead to deceptive novelty. To our knowledge, the ability to efficiently produce examples of grasping trajectories demonstrated in this work has no precedent in the literature.  ( 2 min )
    Zero Grads Ever Given: Learning Local Surrogate Losses for Non-Differentiable Graphics. (arXiv:2308.05739v1 [cs.CV])
    Gradient-based optimization is now ubiquitous across graphics, but unfortunately can not be applied to problems with undefined or zero gradients. To circumvent this issue, the loss function can be manually replaced by a "surrogate" that has similar minima but is differentiable. Our proposed framework, ZeroGrads, automates this process by learning a neural approximation of the objective function, the surrogate, which in turn can be used to differentiate through arbitrary black-box graphics pipelines. We train the surrogate on an actively smoothed version of the objective and encourage locality, focusing the surrogate's capacity on what matters at the current training episode. The fitting is performed online, alongside the parameter optimization, and self-supervised, without pre-computed data or pre-trained models. As sampling the objective is expensive (it requires a full rendering or simulator run), we devise an efficient sampling scheme that allows for tractable run-times and competitive performance at little overhead. We demonstrate optimizing diverse non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling or optimal control in physics-driven animation. In contrast to more traditional algorithms, our approach scales well to higher dimensions, which we demonstrate on problems with up to 35k interlinked variables.
    Forward-Forward Training of an Optical Neural Network. (arXiv:2305.19170v2 [cs.LG] UPDATED)
    Neural networks (NN) have demonstrated remarkable capabilities in various tasks, but their computation-intensive nature demands faster and more energy-efficient hardware implementations. Optics-based platforms, using technologies such as silicon photonics and spatial light modulators, offer promising avenues for achieving this goal. However, training multiple trainable layers in tandem with these physical systems poses challenges, as they are difficult to fully characterize and describe with differentiable functions, hindering the use of error backpropagation algorithm. The recently introduced Forward-Forward Algorithm (FFA) eliminates the need for perfect characterization of the learning system and shows promise for efficient training with large numbers of programmable parameters. The FFA does not require backpropagating an error signal to update the weights, rather the weights are updated by only sending information in one direction. The local loss function for each set of trainable weights enables low-power analog hardware implementations without resorting to metaheuristic algorithms or reinforcement learning. In this paper, we present an experiment utilizing multimode nonlinear wave propagation in an optical fiber demonstrating the feasibility of the FFA approach using an optical system. The results show that incorporating optical transforms in multilayer NN architectures trained with the FFA, can lead to performance improvements, even with a relatively small number of trainable weights. The proposed method offers a new path to the challenge of training optical NNs and provides insights into leveraging physical transformations for enhancing NN performance.
    Deep incremental learning models for financial temporal tabular datasets with distribution shifts. (arXiv:2303.07925v7 [cs.LG] UPDATED)
    We present a robust deep incremental learning framework for regression tasks on financial temporal tabular datasets which is built upon the incremental use of commonly available tabular and time series prediction models to adapt to distributional shifts typical of financial datasets. The framework uses a simple basic building block (decision trees) to build self-similar models of any required complexity to deliver robust performance under adverse situations such as regime changes, fat-tailed distributions, and low signal-to-noise ratios. As a detailed study, we demonstrate our scheme using XGBoost models trained on the Numerai dataset and show that a two layer deep ensemble of XGBoost models over different model snapshots delivers high quality predictions under different market regimes. We also show that the performance of XGBoost models with different number of boosting rounds in three scenarios (small, standard and large) is monotonically increasing with respect to model size and converges towards the generalisation upper bound. We also evaluate the robustness of the model under variability of different hyperparameters, such as model complexity and data sampling settings. Our model has low hardware requirements as no specialised neural architectures are used and each base model can be independently trained in parallel.
    Functional Neural Networks: Shift invariant models for functional data with applications to EEG classification. (arXiv:2301.05869v2 [cs.LG] UPDATED)
    It is desirable for statistical models to detect signals of interest independently of their position. If the data is generated by some smooth process, this additional structure should be taken into account. We introduce a new class of neural networks that are shift invariant and preserve smoothness of the data: functional neural networks (FNNs). For this, we use methods from functional data analysis (FDA) to extend multi-layer perceptrons and convolutional neural networks to functional data. We propose different model architectures, show that the models outperform a benchmark model from FDA in terms of accuracy and successfully use FNNs to classify electroencephalography (EEG) data.
    Distributed Out-of-Memory NMF on CPU/GPU Architectures. (arXiv:2202.09518v3 [cs.DC] UPDATED)
    We propose an efficient distributed out-of-memory implementation of the Non-negative Matrix Factorization (NMF) algorithm for heterogeneous high-performance-computing (HPC) systems. The proposed implementation is based on prior work on NMFk, which can perform automatic model selection and extract latent variables and patterns from data. In this work, we extend NMFk by adding support for dense and sparse matrix operation on multi-node, multi-GPU systems. The resulting algorithm is optimized for out-of-memory (OOM) problems where the memory required to factorize a given matrix is greater than the available GPU memory. Memory complexity is reduced by batching/tiling strategies, and sparse and dense matrix operations are significantly accelerated with GPU cores (or tensor cores when available). Input/Output (I/O) latency associated with batch copies between host and device is hidden using CUDA streams to overlap data transfers and compute asynchronously, and latency associated with collective communications (both intra-node and inter-node) is reduced using optimized NVIDIA Collective Communication Library NCCL based communicators. Benchmark results show significant improvement, from 32X to 76x speedup, with the new implementation using GPUs over the CPU-based NMFk. Good weak scaling was demonstrated on up to 4096 multi-GPU cluster nodes with approximately 25,000 GPUs when decomposing a dense 340 Terabyte-size matrix and an 11 Exabyte-size sparse matrix of density 10e-6.
    $\mathcal{G}^2Pxy$: Generative Open-Set Node Classification on Graphs with Proxy Unknowns. (arXiv:2308.05463v1 [cs.LG])
    Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are often applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i.e. $\mathcal{G}^2Pxy$, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints of both cross entropy loss and complement entropy loss, $\mathcal{G}^2Pxy$ achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on benchmark graph datasets. Moreover, $\mathcal{G}^2Pxy$ does not have specific requirement on the GNN architecture and shows good generalizations.
    IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer. (arXiv:2308.05633v1 [cs.CV])
    Automated medical report generation has become increasingly important in medical analysis. It can produce computer-aided diagnosis descriptions and thus significantly alleviate the doctors' work. Inspired by the huge success of neural machine translation and image captioning, various deep learning methods have been proposed for medical report generation. However, due to the inherent properties of medical data, including data imbalance and the length and correlation between report sequences, the generated reports by existing methods may exhibit linguistic fluency but lack adequate clinical accuracy. In this work, we propose an image-to-indicator hierarchical transformer (IIHT) framework for medical report generation. It consists of three modules, i.e., a classifier module, an indicator expansion module and a generator module. The classifier module first extracts image features from the input medical images and produces disease-related indicators with their corresponding states. The disease-related indicators are subsequently utilised as input for the indicator expansion module, incorporating the "data-text-data" strategy. The transformer-based generator then leverages these extracted features along with image features as auxiliary information to generate final reports. Furthermore, the proposed IIHT method is feasible for radiologists to modify disease indicators in real-world scenarios and integrate the operations into the indicator expansion module for fluent and accurate medical report generation. Extensive experiments and comparisons with state-of-the-art methods under various evaluation metrics demonstrate the great performance of the proposed method.
    Normalized Gradients for All. (arXiv:2308.05621v1 [cs.LG])
    In this short note, I show how to adapt to H\"{o}lder smoothness using normalized gradients in a black-box way. Moreover, the bound will depend on a novel notion of local H\"{o}lder smoothness. The main idea directly comes from Levy [2017].
    Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment. (arXiv:2308.05374v1 [cs.AI])
    Ensuring alignment, which refers to making models behave in accordance with human intentions [1,2], has become a critical task before deploying large language models (LLMs) in real-world applications. For instance, OpenAI devoted six months to iteratively aligning GPT-4 before its release [3]. However, a major challenge faced by practitioners is the lack of clear guidance on evaluating whether LLM outputs align with social norms, values, and regulations. This obstacle hinders systematic iteration and deployment of LLMs. To address this issue, this paper presents a comprehensive survey of key dimensions that are crucial to consider when assessing LLM trustworthiness. The survey covers seven major categories of LLM trustworthiness: reliability, safety, fairness, resistance to misuse, explainability and reasoning, adherence to social norms, and robustness. Each major category is further divided into several sub-categories, resulting in a total of 29 sub-categories. Additionally, a subset of 8 sub-categories is selected for further investigation, where corresponding measurement studies are designed and conducted on several widely-used LLMs. The measurement results indicate that, in general, more aligned models tend to perform better in terms of overall trustworthiness. However, the effectiveness of alignment varies across the different trustworthiness categories considered. This highlights the importance of conducting more fine-grained analyses, testing, and making continuous improvements on LLM alignment. By shedding light on these key dimensions of LLM trustworthiness, this paper aims to provide valuable insights and guidance to practitioners in the field. Understanding and addressing these concerns will be crucial in achieving reliable and ethically sound deployment of LLMs in various applications.
    Critical Points ++: An Agile Point Cloud Importance Measure for Robust Classification, Adversarial Defense and Explainable AI. (arXiv:2308.05525v1 [cs.CV])
    The ability to cope accurately and fast with Out-Of-Distribution (OOD) samples is crucial in real-world safety demanding applications. In this work we first study the interplay between critical points of 3D point clouds and OOD samples. Our findings are that common corruptions and outliers are often interpreted as critical points. We generalize the notion of critical points into importance measures. We show that training a classification network based only on less important points dramatically improves robustness, at a cost of minor performance loss on the clean set. We observe that normalized entropy is highly informative for corruption analysis. An adaptive threshold based on normalized entropy is suggested for selecting the set of uncritical points. Our proposed importance measure is extremely fast to compute. We show it can be used for a variety of applications, such as Explainable AI (XAI), Outlier Removal, Uncertainty Estimation, Robust Classification and Adversarial Defense. We reach SOTA results on the two latter tasks.
    AutoGluon-TimeSeries: AutoML for Probabilistic Time Series Forecasting. (arXiv:2308.05566v1 [cs.LG])
    We introduce AutoGluon-TimeSeries - an open-source AutoML library for probabilistic time series forecasting. Focused on ease of use and robustness, AutoGluon-TimeSeries enables users to generate accurate point and quantile forecasts with just 3 lines of Python code. Built on the design philosophy of AutoGluon, AutoGluon-TimeSeries leverages ensembles of diverse forecasting models to deliver high accuracy within a short training time. AutoGluon-TimeSeries combines both conventional statistical models, machine-learning based forecasting approaches, and ensembling techniques. In our evaluation on 29 benchmark datasets, AutoGluon-TimeSeries demonstrates strong empirical performance, outperforming a range of forecasting methods in terms of both point and quantile forecast accuracy, and often even improving upon the best-in-hindsight combination of prior methods.
    Explainable AI applications in the Medical Domain: a systematic review. (arXiv:2308.05411v1 [cs.AI])
    Artificial Intelligence in Medicine has made significant progress with emerging applications in medical imaging, patient care, and other areas. While these applications have proven successful in retrospective studies, very few of them were applied in practice.The field of Medical AI faces various challenges, in terms of building user trust, complying with regulations, using data ethically.Explainable AI (XAI) aims to enable humans understand AI and trust its results. This paper presents a literature review on the recent developments of XAI solutions for medical decision support, based on a representative sample of 198 articles published in recent years. The systematic synthesis of the relevant articles resulted in several findings. (1) model-agnostic XAI techniques were mostly employed in these solutions, (2) deep learning models are utilized more than other types of machine learning models, (3) explainability was applied to promote trust, but very few works reported the physicians participation in the loop, (4) visual and interactive user interface is more useful in understanding the explanation and the recommendation of the system. More research is needed in collaboration between medical and AI experts, that could guide the development of suitable frameworks for the design, implementation, and evaluation of XAI solutions in medicine.
    A Comparative Assessment of Multi-view fusion learning for Crop Classification. (arXiv:2308.05407v1 [cs.CV])
    With a rapidly increasing amount and diversity of remote sensing (RS) data sources, there is a strong need for multi-view learning modeling. This is a complex task when considering the differences in resolution, magnitude, and noise of RS data. The typical approach for merging multiple RS sources has been input-level fusion, but other - more advanced - fusion strategies may outperform this traditional approach. This work assesses different fusion strategies for crop classification in the CropHarvest dataset. The fusion methods proposed in this work outperform models based on individual views and previous fusion methods. We do not find one single fusion method that consistently outperforms all other approaches. Instead, we present a comparison of multi-view fusion methods for three different datasets and show that, depending on the test region, different methods obtain the best performance. Despite this, we suggest a preliminary criterion for the selection of fusion methods.
    Exploring Machine Learning and Transformer-based Approaches for Deceptive Text Classification: A Comparative Analysis. (arXiv:2308.05476v1 [cs.CL])
    Deceptive text classification is a critical task in natural language processing that aims to identify deceptive or fraudulent content. This study presents a comparative analysis of machine learning and transformer-based approaches for deceptive text classification. We investigate the effectiveness of traditional machine learning algorithms and state-of-the-art transformer models, such as BERT, XLNET, DistilBERT, and RoBERTa, in detecting deceptive text. A labeled dataset consisting of deceptive and non-deceptive texts is used for training and evaluation purposes. Through extensive experimentation, we compare the performance metrics, including accuracy, precision, recall, and F1 score, of the different approaches. The results of this study shed light on the strengths and limitations of machine learning and transformer-based methods for deceptive text classification, enabling researchers and practitioners to make informed decisions when dealing with deceptive content
    FINER: Enhancing State-of-the-art Classifiers with Feature Attribution to Facilitate Security Analysis. (arXiv:2308.05362v1 [cs.CR])
    Deep learning classifiers achieve state-of-the-art performance in various risk detection applications. They explore rich semantic representations and are supposed to automatically discover risk behaviors. However, due to the lack of transparency, the behavioral semantics cannot be conveyed to downstream security experts to reduce their heavy workload in security analysis. Although feature attribution (FA) methods can be used to explain deep learning, the underlying classifier is still blind to what behavior is suspicious, and the generated explanation cannot adapt to downstream tasks, incurring poor explanation fidelity and intelligibility. In this paper, we propose FINER, the first framework for risk detection classifiers to generate high-fidelity and high-intelligibility explanations. The high-level idea is to gather explanation efforts from model developer, FA designer, and security experts. To improve fidelity, we fine-tune the classifier with an explanation-guided multi-task learning strategy. To improve intelligibility, we engage task knowledge to adjust and ensemble FA methods. Extensive evaluations show that FINER improves explanation quality for risk detection. Moreover, we demonstrate that FINER outperforms a state-of-the-art tool in facilitating malware analysis.
    Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio. (arXiv:2308.05218v1 [cs.SD])
    We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR). The model consists of a TitaNet based speaker embedding module, a Conformer based masking as well as ASR modules. These modules are jointly optimized to transcribe a target-speaker, while ignoring speech from other speakers. For training we use Connectionist Temporal Classification (CTC) loss and introduce a scale-invariant spectrogram reconstruction loss to encourage the model better separate the target-speaker's spectrogram from mixture. We obtain state-of-the-art target-speaker word error rate (TS-WER) on WSJ0-2mix-extr (4.2%). Further, we report for the first time TS-WER on WSJ0-3mix-extr (12.4%), LibriSpeech2Mix (4.2%) and LibriSpeech3Mix (7.6%) datasets, establishing new benchmarks for TS-ASR. The proposed model will be open-sourced through NVIDIA NeMo toolkit.
    Leveraging the Edge and Cloud for V2X-Based Real-Time Object Detection in Autonomous Driving. (arXiv:2308.05234v1 [cs.CV])
    Environmental perception is a key element of autonomous driving because the information received from the perception module influences core driving decisions. An outstanding challenge in real-time perception for autonomous driving lies in finding the best trade-off between detection quality and latency. Major constraints on both computation and power have to be taken into account for real-time perception in autonomous vehicles. Larger object detection models tend to produce the best results, but are also slower at runtime. Since the most accurate detectors cannot run in real-time locally, we investigate the possibility of offloading computation to edge and cloud platforms, which are less resource-constrained. We create a synthetic dataset to train object detection models and evaluate different offloading strategies. Using real hardware and network simulations, we compare different trade-offs between prediction quality and end-to-end delay. Since sending raw frames over the network implies additional transmission delays, we also explore the use of JPEG and H.265 compression at varying qualities and measure their impact on prediction metrics. We show that models with adequate compression can be run in real-time on the cloud while outperforming local detection performance.
    Data-driven Intra-Autonomous Systems Graph Generator. (arXiv:2308.05254v1 [cs.NI])
    This paper introduces a novel deep-learning based generator of synthetic graphs that represent intra-Autonomous System (AS) in the Internet, named Deep-generative graphs for the Internet (DGGI). It also presents a novel massive dataset of real intra-AS graphs extracted from the project Internet Topology Data Kit (ITDK), called Internet Graphs (IGraphs). To create IGraphs, the Filtered Recurrent Multi-level (FRM) algorithm for community extraction was developed. It is shown that DGGI creates synthetic graphs which accurately reproduce the properties of centrality, clustering, assortativity, and node degree. The DGGI generator overperforms existing Internet topology generators. On average, DGGI improves the Maximum Mean Discrepancy (MMD) metric 84.4%, 95.1%, 97.9%, and 94.7% for assortativity, betweenness, clustering, and node degree, respectively.
    OpenProteinSet: Training data for structural biology at scale. (arXiv:2308.05326v1 [q-bio.BM])
    Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
    Machine Learning aided Computer Architecture Design for CNN Inferencing Systems. (arXiv:2308.05364v1 [cs.AR])
    Efficient and timely calculations of Machine Learning (ML) algorithms are essential for emerging technologies like autonomous driving, the Internet of Things (IoT), and edge computing. One of the primary ML algorithms used in such systems is Convolutional Neural Networks (CNNs), which demand high computational resources. This requirement has led to the use of ML accelerators like GPGPUs to meet design constraints. However, selecting the most suitable accelerator involves Design Space Exploration (DSE), a process that is usually time-consuming and requires significant manual effort. Our work presents approaches to expedite the DSE process by identifying the most appropriate GPGPU for CNN inferencing systems. We have developed a quick and precise technique for forecasting the power and performance of CNNs during inference, with a MAPE of 5.03% and 5.94%, respectively. Our approach empowers computer architects to estimate power and performance in the early stages of development, reducing the necessity for numerous prototypes. This saves time and money while also improving the time-to-market period.
    Flexible Isosurface Extraction for Gradient-Based Mesh Optimization. (arXiv:2308.05371v1 [cs.GR])
    This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics. Existing implementations adapt classic isosurface extraction algorithms like Marching Cubes or Dual Contouring; these techniques were designed to extract meshes from fixed, known fields, and in the optimization setting they lack the degrees of freedom to represent high-quality feature-preserving meshes, or suffer from numerical instabilities. We introduce FlexiCubes, an isosurface representation specifically designed for optimizing an unknown mesh with respect to geometric, visual, or even physical objectives. Our main insight is to introduce additional carefully-chosen parameters into the representation, which allow local flexible adjustments to the extracted mesh geometry and connectivity. These parameters are updated along with the underlying scalar field via automatic differentiation when optimizing for a downstream task. We base our extraction scheme on Dual Marching Cubes for improved topological properties, and present extensions to optionally generate tetrahedral and hierarchically-adaptive meshes. Extensive experiments validate FlexiCubes on both synthetic benchmarks and real-world applications, showing that it offers significant improvements in mesh quality and geometric fidelity.
    Training neural networks with end-to-end optical backpropagation. (arXiv:2308.05226v1 [physics.optics])
    Optics is an exciting route for the next generation of computing hardware for machine learning, promising several orders of magnitude enhancement in both computational speed and energy efficiency. However, to reach the full capacity of an optical neural network it is necessary that the computing not only for the inference, but also for the training be implemented optically. The primary algorithm for training a neural network is backpropagation, in which the calculation is performed in the order opposite to the information flow for inference. While straightforward in a digital computer, optical implementation of backpropagation has so far remained elusive, particularly because of the conflicting requirements for the optical element that implements the nonlinear activation function. In this work, we address this challenge for the first time with a surprisingly simple and generic scheme. Saturable absorbers are employed for the role of the activation units, and the required properties are achieved through a pump-probe process, in which the forward propagating signal acts as the pump and backward as the probe. Our approach is adaptable to various analog platforms, materials, and network structures, and it demonstrates the possibility of constructing neural networks entirely reliant on analog optical processes for both training and inference tasks.
    AI-Enabled Software and System Architecture Frameworks: Focusing on smart Cyber-Physical Systems (CPS). (arXiv:2308.05239v1 [cs.SE])
    Several architecture frameworks for software, systems, and enterprises have been proposed in the literature. They identified various stakeholders and defined architecture viewpoints and views to frame and address stakeholder concerns. However, the stakeholders with data science and Machine Learning (ML) related concerns, such as data scientists and data engineers, are yet to be included in existing architecture frameworks. Therefore, they failed to address the architecture viewpoints and views responsive to the concerns of the data science community. In this paper, we address this gap by establishing the architecture frameworks adapted to meet the requirements of modern applications and organizations where ML artifacts are both prevalent and crucial. In particular, we focus on ML-enabled Cyber-Physical Systems (CPSs) and propose two sets of merit criteria for their efficient development and performance assessment, namely the criteria for evaluating and benchmarking ML-enabled CPSs, and the criteria for evaluation and benchmarking of the tools intended to support users through the modeling and development pipeline. In this study, we deploy multiple empirical and qualitative research methods based on literature review and survey instruments including expert interviews and an online questionnaire. We collect, analyze, and integrate the opinions of 77 experts from more than 25 organizations in over 10 countries to devise and validate the proposed framework.  ( 2 min )
    Homophily-enhanced Structure Learning for Graph Clustering. (arXiv:2308.05309v1 [cs.LG])
    Graph clustering is a fundamental task in graph analysis, and recent advances in utilizing graph neural networks (GNNs) have shown impressive results. Despite the success of existing GNN-based graph clustering methods, they often overlook the quality of graph structure, which is inherent in real-world graphs due to their sparse and multifarious nature, leading to subpar performance. Graph structure learning allows refining the input graph by adding missing links and removing spurious connections. However, previous endeavors in graph structure learning have predominantly centered around supervised settings, and cannot be directly applied to our specific clustering tasks due to the absence of ground-truth labels. To bridge the gap, we propose a novel method called \textbf{ho}mophily-enhanced structure \textbf{le}arning for graph clustering (HoLe). Our motivation stems from the observation that subtly enhancing the degree of homophily within the graph structure can significantly improve GNNs and clustering outcomes. To realize this objective, we develop two clustering-oriented structure learning modules, i.e., hierarchical correlation estimation and cluster-aware sparsification. The former module enables a more accurate estimation of pairwise node relationships by leveraging guidance from latent and clustering spaces, while the latter one generates a sparsified structure based on the similarity matrix and clustering assignments. Additionally, we devise a joint optimization approach alternating between training the homophily-enhanced structure learning and GNN-based clustering, thereby enforcing their reciprocal effects. Extensive experiments on seven benchmark datasets of various types and scales, across a range of clustering metrics, demonstrate the superiority of HoLe against state-of-the-art baselines.  ( 3 min )
    Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient. (arXiv:2308.05681v1 [cs.CV])
    Recently, methods for skeleton-based human activity recognition have been shown to be vulnerable to adversarial attacks. However, these attack methods require either the full knowledge of the victim (i.e. white-box attacks), access to training data (i.e. transfer-based attacks) or frequent model queries (i.e. black-box attacks). All their requirements are highly restrictive, raising the question of how detrimental the vulnerability is. In this paper, we show that the vulnerability indeed exists. To this end, we consider a new attack task: the attacker has no access to the victim model or the training data or labels, where we coin the term hard no-box attack. Specifically, we first learn a motion manifold where we define an adversarial loss to compute a new gradient for the attack, named skeleton-motion-informed (SMI) gradient. Our gradient contains information of the motion dynamics, which is different from existing gradient-based attack methods that compute the loss gradient assuming each dimension in the data is independent. The SMI gradient can augment many gradient-based attack methods, leading to a new family of no-box attack methods. Extensive evaluation and comparison show that our method imposes a real threat to existing classifiers. They also show that the SMI gradient improves the transferability and imperceptibility of adversarial samples in both no-box and transfer-based black-box settings.  ( 2 min )
    Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review. (arXiv:2308.05731v1 [cs.RO])
    Automated driving has the potential to revolutionize personal, public, and freight mobility. Besides the enormous challenge of perception, i.e. accurately perceiving the environment using available sensor data, automated driving comprises planning a safe, comfortable, and efficient motion trajectory. To promote safety and progress, many works rely on modules that predict the future motion of surrounding traffic. Modular automated driving systems commonly handle prediction and planning as sequential separate tasks. While this accounts for the influence of surrounding traffic on the ego-vehicle, it fails to anticipate the reactions of traffic participants to the ego-vehicle's behavior. Recent works suggest that integrating prediction and planning in an interdependent joint step is necessary to achieve safe, efficient, and comfortable driving. While various models implement such integrated systems, a comprehensive overview and theoretical understanding of different principles are lacking. We systematically review state-of-the-art deep learning-based prediction, planning, and integrated prediction and planning models. Different facets of the integration ranging from model architecture and model design to behavioral aspects are considered and related to each other. Moreover, we discuss the implications, strengths, and limitations of different integration methods. By pointing out research gaps, describing relevant future challenges, and highlighting trends in the research field, we identify promising directions for future research.
    Scaling Data Generation in Vision-and-Language Navigation. (arXiv:2307.15644v2 [cs.CV] UPDATED)
    Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents. To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs using fully-accessible resources on the web. Importantly, we investigate the influence of each component in this paradigm on the agent's performance and study how to adequately apply the augmented data to pre-train and fine-tune an agent. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning. The long-lasting generalization gap between navigating in seen and unseen environments is also reduced to less than 1% (versus 8% in the previous best method). Moreover, our paradigm also facilitates different models to achieve new state-of-the-art navigation results on CVDN, REVERIE, and R2R in continuous environments.
    Hierarchical Representations for Spatio-Temporal Visual Attention Modeling and Understanding. (arXiv:2308.05189v1 [cs.CV])
    This PhD. Thesis concerns the study and development of hierarchical representations for spatio-temporal visual attention modeling and understanding in video sequences. More specifically, we propose two computational models for visual attention. First, we present a generative probabilistic model for context-aware visual attention modeling and understanding. Secondly, we develop a deep network architecture for visual attention modeling, which first estimates top-down spatio-temporal visual attention, and ultimately serves for modeling attention in the temporal domain.  ( 2 min )
    Models Matter: The Impact of Single-Step Retrosynthesis on Synthesis Planning. (arXiv:2308.05522v1 [cs.AI])
    Retrosynthesis consists of breaking down a chemical compound recursively step-by-step into molecular precursors until a set of commercially available molecules is found with the goal to provide a synthesis route. Its two primary research directions, single-step retrosynthesis prediction, which models the chemical reaction logic, and multi-step synthesis planning, which tries to find the correct sequence of reactions, are inherently intertwined. Still, this connection is not reflected in contemporary research. In this work, we combine these two major research directions by applying multiple single-step retrosynthesis models within multi-step synthesis planning and analyzing their impact using public and proprietary reaction data. We find a disconnection between high single-step performance and potential route-finding success, suggesting that single-step models must be evaluated within synthesis planning in the future. Furthermore, we show that the commonly used single-step retrosynthesis benchmark dataset USPTO-50k is insufficient as this evaluation task does not represent model performance and scalability on larger and more diverse datasets. For multi-step synthesis planning, we show that the choice of the single-step model can improve the overall success rate of synthesis planning by up to +28% compared to the commonly used baseline model. Finally, we show that each single-step model finds unique synthesis routes, and differs in aspects such as route-finding success, the number of found synthesis routes, and chemical validity, making the combination of single-step retrosynthesis prediction and multi-step synthesis planning a crucial aspect when developing future methods.  ( 3 min )
    From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks. (arXiv:2307.02279v2 [math.OC] UPDATED)
    The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularization, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularization may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.  ( 2 min )
    Benchmarking and Analyzing Robust Point Cloud Recognition: Bag of Tricks for Defending Adversarial Examples. (arXiv:2307.16361v2 [cs.CV] UPDATED)
    Deep Neural Networks (DNNs) for 3D point cloud recognition are vulnerable to adversarial examples, threatening their practical deployment. Despite the many research endeavors have been made to tackle this issue in recent years, the diversity of adversarial examples on 3D point clouds makes them more challenging to defend against than those on 2D images. For examples, attackers can generate adversarial examples by adding, shifting, or removing points. Consequently, existing defense strategies are hard to counter unseen point cloud adversarial examples. In this paper, we first establish a comprehensive, and rigorous point cloud adversarial robustness benchmark to evaluate adversarial robustness, which can provide a detailed understanding of the effects of the defense and attack methods. We then collect existing defense tricks in point cloud adversarial defenses and then perform extensive and systematic experiments to identify an effective combination of these tricks. Furthermore, we propose a hybrid training augmentation methods that consider various types of point cloud adversarial examples to adversarial training, significantly improving the adversarial robustness. By combining these tricks, we construct a more robust defense framework achieving an average accuracy of 83.45\% against various attacks, demonstrating its capability to enabling robust learners. Our codebase are open-sourced on: \url{https://github.com/qiufan319/benchmark_pc_attack.git}.  ( 3 min )
    Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning. (arXiv:2308.05680v1 [cs.CL])
    The task of retrieving already debunked narratives aims to detect stories that have already been fact-checked. The successful detection of claims that have already been debunked not only reduces the manual efforts of professional fact-checkers but can also contribute to slowing the spread of misinformation. Mainly due to the lack of readily available data, this is an understudied problem, particularly when considering the cross-lingual task, i.e. the retrieval of fact-checking articles in a language different from the language of the online post being checked. This paper fills this gap by (i) creating a novel dataset to enable research on cross-lingual retrieval of already debunked narratives, using tweets as queries to a database of fact-checking articles; (ii) presenting an extensive experiment to benchmark fine-tuned and off-the-shelf multilingual pre-trained Transformer models for this task; and (iii) proposing a novel multistage framework that divides this cross-lingual debunk retrieval task into refinement and re-ranking stages. Results show that the task of cross-lingual retrieval of already debunked narratives is challenging and off-the-shelf Transformer models fail to outperform a strong lexical-based baseline (BM25). Nevertheless, our multistage retrieval framework is robust, outperforming BM25 in most scenarios and enabling cross-domain and zero-shot learning, without significantly harming the model's performance.  ( 2 min )
    SegMatch: A semi-supervised learning method for surgical instrument segmentation. (arXiv:2308.05232v1 [cs.CV])
    Surgical instrument segmentation is recognised as a key enabler to provide advanced surgical assistance and improve computer assisted interventions. In this work, we propose SegMatch, a semi supervised learning method to reduce the need for expensive annotation for laparoscopic and robotic surgical images. SegMatch builds on FixMatch, a widespread semi supervised classification pipeline combining consistency regularization and pseudo labelling, and adapts it for the purpose of segmentation. In our proposed SegMatch, the unlabelled images are weakly augmented and fed into the segmentation model to generate a pseudo-label to enforce the unsupervised loss against the output of the model for the adversarial augmented image on the pixels with a high confidence score. Our adaptation for segmentation tasks includes carefully considering the equivariance and invariance properties of the augmentation functions we rely on. To increase the relevance of our augmentations, we depart from using only handcrafted augmentations and introduce a trainable adversarial augmentation strategy. Our algorithm was evaluated on the MICCAI Instrument Segmentation Challenge datasets Robust-MIS 2019 and EndoVis 2017. Our results demonstrate that adding unlabelled data for training purposes allows us to surpass the performance of fully supervised approaches which are limited by the availability of training data in these challenges. SegMatch also outperforms a range of state-of-the-art semi-supervised learning semantic segmentation models in different labelled to unlabelled data ratios.  ( 2 min )
    Evaluating Pedestrian Trajectory Prediction Methods for the Application in Autonomous Driving. (arXiv:2308.05194v1 [cs.LG])
    In this paper, the state of the art in the field of pedestrian trajectory prediction is evaluated alongside the constant velocity model (CVM) with respect to its applicability in autonomous vehicles. The evaluation is conducted on the widely-used ETH/UCY dataset where the Average Displacement Error (ADE) and the Final Displacement Error (FDE) are reported. To align with requirements in real-world applications, modifications are made to the input features of the initially proposed models. An ablation study is conducted to examine the influence of the observed motion history on the prediction performance, thereby establishing a better understanding of its impact. Additionally, the inference time of each model is measured to evaluate the scalability of each model when confronted with varying amounts of agents. The results demonstrate that simple models remain competitive when generating single trajectories, and certain features commonly thought of as useful have little impact on the overall performance across different architectures. Based on these findings, recommendations are proposed to guide the future development of trajectory prediction algorithms.  ( 2 min )
    Efficient Variational Inference for Large Skew-t Copulas with Application to Intraday Equity Returns. (arXiv:2308.05564v1 [econ.EM])
    Large skew-t factor copula models are attractive for the modeling of financial data because they allow for asymmetric and extreme tail dependence. We show that the copula implicit in the skew-t distribution of Azzalini and Capitanio (2003) allows for a higher level of pairwise asymmetric dependence than two popular alternative skew-t copulas. Estimation of this copula in high dimensions is challenging, and we propose a fast and accurate Bayesian variational inference (VI) approach to do so. The method uses a conditionally Gaussian generative representation of the skew-t distribution to define an augmented posterior that can be approximated accurately. A fast stochastic gradient ascent algorithm is used to solve the variational optimization. The new methodology is used to estimate copula models for intraday returns from 2017 to 2021 on 93 U.S. equities. The copula captures substantial heterogeneity in asymmetric dependence over equity pairs, in addition to the variability in pairwise correlations. We show that intraday predictive densities from the skew-t copula are more accurate than from some other copula models, while portfolio selection strategies based on the estimated pairwise tail dependencies improve performance relative to the benchmark index.  ( 2 min )
    Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design. (arXiv:2211.03942v3 [cs.LG] UPDATED)
    In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model. The main challenge in this setting is balancing privacy with both classification accuracy of the learnt model as well as the number of bits communicated between the clients and server. Prior work has achieved a good trade-off by designing a privacy-aware compression mechanism, called the minimum variance unbiased (MVU) mechanism, that numerically solves an optimization problem to determine the parameters of the mechanism. This paper builds upon it by introducing a new interpolation procedure in the numerical design process that allows for a far more efficient privacy analysis. The result is the new Interpolated MVU mechanism that is more scalable, has a better privacy-utility trade-off, and provides SOTA results on communication-efficient private FL on a variety of datasets.  ( 2 min )
    Updating Clinical Risk Stratification Models Using Rank-Based Compatibility: Approaches for Evaluating and Optimizing Clinician-Model Team Performance. (arXiv:2308.05619v1 [stat.ML])
    As data shift or new data become available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.  ( 2 min )
    Structure in Reinforcement Learning: A Survey and Open Problems. (arXiv:2306.16021v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural Networks (DNNs) for function approximation, has demonstrated considerable success in numerous applications. However, its practicality in addressing various real-world scenarios, characterized by diverse and unpredictable dynamics, noisy signals, and large state and action spaces, remains limited. This limitation stems from issues such as poor data efficiency, limited generalization capabilities, a lack of safety guarantees, and the absence of interpretability, among other factors. To overcome these challenges and improve performance across these crucial metrics, one promising avenue is to incorporate additional structural information about the problem into the RL learning process. Various sub-fields of RL have proposed methods for incorporating such inductive biases. We amalgamate these diverse methodologies under a unified framework, shedding light on the role of structure in the learning problem, and classify these methods into distinct patterns of incorporating structure. By leveraging this comprehensive framework, we provide valuable insights into the challenges of structured RL and lay the groundwork for a design pattern perspective on RL research. This novel perspective paves the way for future advancements and aids in developing more effective and efficient RL algorithms that can potentially handle real-world scenarios better.  ( 2 min )
    Optimizing Performance of Feedforward and Convolutional Neural Networks through Dynamic Activation Functions. (arXiv:2308.05724v1 [cs.LG])
    Deep learning training training algorithms are a huge success in recent years in many fields including speech, text,image video etc. Deeper and deeper layers are proposed with huge success with resnet structures having around 152 layers. Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained. Activation functions used in the network are of utmost importance, as they provide non linearity to the networks. Relu's are the most commonly used activation function.We show a complex piece-wise linear(PWL) activation in the hidden layer. We show that these PWL activations work much better than relu activations in our networks for convolution neural networks and multilayer perceptrons. Result comparison in PyTorch for shallow and deep CNNs are given to further strengthen our case.
    EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis. (arXiv:2308.05725v1 [cs.CL])
    Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthesis datasets are read, severely limiting spontaneity and expressivity. Here, we introduce Expresso, a high-quality expressive speech dataset for textless speech synthesis that includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles. We illustrate the challenges and potentials of this dataset with an expressive resynthesis benchmark where the task is to encode the input in low-bitrate units and resynthesize it in a target voice while preserving content and style. We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders, and explore tradeoffs between quality, bitrate and invariance to speaker and style. All the dataset, evaluation metrics and baseline models are open source
    Multi-metrics adaptively identifies backdoors in Federated learning. (arXiv:2303.06601v2 [cs.CR] UPDATED)
    The decentralized and privacy-preserving nature of federated learning (FL) makes it vulnerable to backdoor attacks aiming to manipulate the behavior of the resulting model on specific adversary-chosen inputs. However, most existing defenses based on statistical differences take effect only against specific attacks, especially when the malicious gradients are similar to benign ones or the data are highly non-independent and identically distributed (non-IID). In this paper, we revisit the distance-based defense methods and discover that i) Euclidean distance becomes meaningless in high dimensions and ii) malicious gradients with diverse characteristics cannot be identified by a single metric. To this end, we present a simple yet effective defense strategy with multi-metrics and dynamic weighting to identify backdoors adaptively. Furthermore, our novel defense has no reliance on predefined assumptions over attack settings or data distributions and little impact on benign performance. To evaluate the effectiveness of our approach, we conduct comprehensive experiments on different datasets under various attack settings, where our method achieves the best defensive performance. For instance, we achieve the lowest backdoor accuracy of 3.06% under the difficult Edge-case PGD, showing significant superiority over previous defenses. The results also demonstrate that our method can be well-adapted to a wide range of non-IID degrees without sacrificing the benign performance.
    Cross-heterogeneity Graph Few-shot Learning. (arXiv:2308.05275v1 [cs.LG])
    In recent years, heterogeneous graph few-shot learning has been proposed to address the label sparsity issue in heterogeneous graphs (HGs), which contain various types of nodes and edges. The existing methods have achieved good performance by transferring generalized knowledge extracted from rich-labeled classes in source HG(s) to few-labeled classes in a target HG. However, these methods only consider the single-heterogeneity scenario where the source and target HGs share a fixed set of node/edge types, ignoring the more general scenario of cross-heterogeneity, where each HG can have a different and non-fixed set of node/edge types. To this end, we focus on the unexplored cross-heterogeneity scenario and propose a novel model for Cross-heterogeneity Graph Few-shot Learning, namely CGFL. In CGFL, we first extract meta-patterns to capture heterogeneous information and propose a multi-view heterogeneous graph neural network (MHGN) to learn meta-patterns across HGs. Then, we propose a score module to measure the informativeness of labeled samples and determine the transferability of each source HG. Finally, by integrating MHGN and the score module into a meta-learning mechanism, CGFL can effectively transfer generalized knowledge to predict new classes with few-labeled data. Extensive experiments on four real-world datasets have demonstrated the superior performance of CGFL over the state-of-the-art methods.
    Byzantine-Robust Decentralized Stochastic Optimization with Stochastic Gradient Noise-Independent Learning Error. (arXiv:2308.05292v1 [cs.LG])
    This paper studies Byzantine-robust stochastic optimization over a decentralized network, where every agent periodically communicates with its neighbors to exchange local models, and then updates its own local model by stochastic gradient descent (SGD). The performance of such a method is affected by an unknown number of Byzantine agents, which conduct adversarially during the optimization process. To the best of our knowledge, there is no existing work that simultaneously achieves a linear convergence speed and a small learning error. We observe that the learning error is largely dependent on the intrinsic stochastic gradient noise. Motivated by this observation, we introduce two variance reduction methods, stochastic average gradient algorithm (SAGA) and loopless stochastic variance-reduced gradient (LSVRG), to Byzantine-robust decentralized stochastic optimization for eliminating the negative effect of the stochastic gradient noise. The two resulting methods, BRAVO-SAGA and BRAVO-LSVRG, enjoy both linear convergence speeds and stochastic gradient noise-independent learning errors. Such learning errors are optimal for a class of methods based on total variation (TV)-norm regularization and stochastic subgradient update. We conduct extensive numerical experiments to demonstrate their effectiveness under various Byzantine attacks.
    A Brief Review of Hypernetworks in Deep Learning. (arXiv:2306.06955v2 [cs.LG] UPDATED)
    Hypernetworks, or hypernets in short, are neural networks that generate weights for another neural network, known as the target network. They have emerged as a powerful deep learning technique that allows for greater flexibility, adaptability, dynamism, faster training, information sharing, and model compression etc. Hypernets have shown promising results in a variety of deep learning problems, including continual learning, causal inference, transfer learning, weight pruning, uncertainty quantification, zero-shot learning, natural language processing, and reinforcement learning etc. Despite their success across different problem settings, currently, there is no review available to inform the researchers about the developments and to help in utilizing hypernets. To fill this gap, we review the progress in hypernets. We present an illustrative example to train deep neural networks using hypernets and propose categorizing hypernets based on five design criteria as inputs, outputs, variability of inputs and outputs, and architecture of hypernets. We also review applications of hypernets across different deep learning problem settings, followed by a discussion of general scenarios where hypernets can be effectively employed. Finally, we discuss the challenges and future directions that remain under-explored in the field of hypernets. We believe that hypernetworks have the potential to revolutionize the field of deep learning. They offer a new way to design and train neural networks, and they have the potential to improve the performance of deep learning models on a variety of tasks. Through this review, we aim to inspire further advancements in deep learning through hypernetworks.
    SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling. (arXiv:2308.04365v3 [stat.ML] UPDATED)
    Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
    Neural Progressive Meshes. (arXiv:2308.05741v1 [cs.CV])
    The recent proliferation of 3D content that can be consumed on hand-held devices necessitates efficient tools for transmitting large geometric data, e.g., 3D meshes, over the Internet. Detailed high-resolution assets can pose a challenge to storage as well as transmission bandwidth, and level-of-detail techniques are often used to transmit an asset using an appropriate bandwidth budget. It is especially desirable for these methods to transmit data progressively, improving the quality of the geometry with more data. Our key insight is that the geometric details of 3D meshes often exhibit similar local patterns even across different shapes, and thus can be effectively represented with a shared learned generative space. We learn this space using a subdivision-based encoder-decoder architecture trained in advance on a large collection of surfaces. We further observe that additional residual features can be transmitted progressively between intermediate levels of subdivision that enable the client to control the tradeoff between bandwidth cost and quality of reconstruction, providing a neural progressive mesh representation. We evaluate our method on a diverse set of complex 3D shapes and demonstrate that it outperforms baselines in terms of compression ratio and reconstruction quality.
    A hybrid deep-learning-metaheuristic framework for bi-level network design problems. (arXiv:2303.06024v3 [cs.NE] UPDATED)
    This study proposes a hybrid deep-learning-metaheuristic framework with a bi-level architecture for road network design problems (NDPs). We train a graph neural network (GNN) to approximate the solution of the user equilibrium (UE) traffic assignment problem and use inferences made by the trained model to calculate fitness function evaluations of a genetic algorithm (GA) to approximate solutions for NDPs. Using three test networks, two NDP variants and an exact solver as benchmark, we show that on average, our proposed framework can provide solutions within 1.5% gap of the best results in less than 0.5% of the time used by the exact solution procedure. Our framework can be utilized within an expert system for infrastructure planning to determine the best infrastructure planning and management decisions under different scenarios. Given the flexibility of the framework, it can easily be adapted to many other decision problems that can be modeled as bi-level problems on graphs. Moreover, we foreseen interesting future research directions, thus we also put forward a brief research agenda for this topic. The key observation from our research that can shape future research is that the fitness function evaluation time using the inferences made by the GNN model was in the order of milliseconds, which points to an opportunity and a need for novel heuristics that 1) can cope well with noisy fitness function values provided by deep learning models, and 2) can use the significantly enlarged efficiency of the evaluation step to explore the search space effectively (rather than efficiently). This opens a new avenue for a modern class of metaheuristics that are crafted for use with AI-powered predictors.
    Diffusion Denoised Smoothing for Certified and Adversarial Robust Out-Of-Distribution Detection. (arXiv:2303.14961v3 [cs.LG] UPDATED)
    As the use of machine learning continues to expand, the importance of ensuring its safety cannot be overstated. A key concern in this regard is the ability to identify whether a given sample is from the training distribution, or is an "Out-Of-Distribution" (OOD) sample. In addition, adversaries can manipulate OOD samples in ways that lead a classifier to make a confident prediction. In this study, we present a novel approach for certifying the robustness of OOD detection within a $\ell_2$-norm around the input, regardless of network architecture and without the need for specific components or additional training. Further, we improve current techniques for detecting adversarial attacks on OOD samples, while providing high levels of certified and adversarial robustness on in-distribution samples. The average of all OOD detection metrics on CIFAR10/100 shows an increase of $\sim 13 \% / 5\%$ relative to previous approaches.
    A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control. (arXiv:2308.05711v1 [cs.LG])
    Reinforcement learning (RL) is a promising approach for optimizing HVAC control. RL offers a framework for improving system performance, reducing energy consumption, and enhancing cost efficiency. We benchmark two popular classical and deep RL methods (Q-Learning and Deep-Q-Networks) across multiple HVAC environments and explore the practical consideration of model hyper-parameter selection and reward tuning. The findings provide insight for configuring RL agents in HVAC systems, promoting energy-efficient and cost-effective operation.
    Online learning techniques for prediction of temporal tabular datasets with regime changes. (arXiv:2301.00790v4 [q-fin.CP] UPDATED)
    The application of deep learning to non-stationary temporal datasets can lead to overfitted models that underperform under regime changes. In this work, we propose a modular machine learning pipeline for ranking predictions on temporal panel datasets which is robust under regime changes. The modularity of the pipeline allows the use of different models, including Gradient Boosting Decision Trees (GBDTs) and Neural Networks, with and without feature engineering. We evaluate our framework on financial data for stock portfolio prediction, and find that GBDT models with dropout display high performance, robustness and generalisability with reduced complexity and computational cost. We then demonstrate how online learning techniques, which require no retraining of models, can be used post-prediction to enhance the results. First, we show that dynamic feature projection improves robustness by reducing drawdown in regime changes. Second, we demonstrate that dynamical model ensembling based on selection of models with good recent performance leads to improved Sharpe and Calmar ratios of out-of-sample predictions. We also evaluate the robustness of our pipeline across different data splits and random seeds with good reproducibility.
    Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning. (arXiv:2302.09738v7 [stat.ML] UPDATED)
    Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations. Here, we simplify such difficulties for a class of sparse or structured symmetric positive-definite matrices with the affine-invariant metric. We do so by proposing a generalized version of the Riemannian normal coordinates that dynamically orthonormalizes the metric and locally converts the problem into an unconstrained problem in the Euclidean space. We use our approach to simplify existing approaches for structured covariances and develop matrix-inverse-free $2^\text{nd}$-order optimizers for deep learning with low precision by using only matrix multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
    Multi-graph Spatio-temporal Graph Convolutional Network for Traffic Flow Prediction. (arXiv:2308.05601v1 [cs.LG])
    Inter-city highway transportation is significant for urban life. As one of the key functions in intelligent transportation system (ITS), traffic evaluation always plays significant role nowadays, and daily traffic flow prediction still faces challenges at network-wide toll stations. On the one hand, the data imbalance in practice among various locations deteriorates the performance of prediction. On the other hand, complex correlative spatio-temporal factors cannot be comprehensively employed in long-term duration. In this paper, a prediction method is proposed for daily traffic flow in highway domain through spatio-temporal deep learning. In our method, data normalization strategy is used to deal with data imbalance, due to long-tail distribution of traffic flow at network-wide toll stations. And then, based on graph convolutional network, we construct networks in distinct semantics to capture spatio-temporal features. Beside that, meteorology and calendar features are used by our model in the full connection stage to extra external characteristics of traffic flow. By extensive experiments and case studies in one Chinese provincial highway, our method shows clear improvement in predictive accuracy than baselines and practical benefits in business.
    LLM As DBA. (arXiv:2308.05481v1 [cs.DB])
    Database administrators (DBAs) play a crucial role in managing, maintaining and optimizing a database system to ensure data availability, performance, and reliability. However, it is hard and tedious for DBAs to manage a large number of database instances (e.g., millions of instances on the cloud databases). Recently large language models (LLMs) have shown great potential to understand valuable documents and accordingly generate reasonable answers. Thus, we propose D-Bot, a LLM-based database administrator that can continuously acquire database maintenance experience from textual sources, and provide reasonable, well-founded, in-time diagnosis and optimization advice for target databases. This paper presents a revolutionary LLM-centric framework for database maintenance, including (i) database maintenance knowledge detection from documents and tools, (ii) tree of thought reasoning for root cause analysis, and (iii) collaborative diagnosis among multiple LLMs. Our preliminary experimental results that D-Bot can efficiently and effectively diagnose the root causes and our code is available at github.com/TsinghuaDatabaseGroup/DB-GPT.
    NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search. (arXiv:2308.05600v1 [cs.LG])
    Deep neural network (DNN) deployment has been confined to larger hardware devices due to their expensive computational requirements. This challenge has recently reached another scale with the emergence of large language models (LLMs). In order to reduce both their memory footprint and latency, a promising technique is quantization. It consists in converting floating point representations to low bit-width fixed point representations, usually by assuming a uniform mapping onto a regular grid. This process, referred to in the literature as uniform quantization, may however be ill-suited as most DNN weights and activations follow a bell-shaped distribution. This is even worse on LLMs whose weight distributions are known to exhibit large, high impact, outlier values. In this work, we propose an improvement over the most commonly adopted way to tackle this limitation in deep learning models quantization, namely, non-uniform quantization. NUPES leverages automorphisms to preserve the scalar multiplications. Such transformations are derived from power functions. However, the optimization of the exponent parameter and weight values remains a challenging and novel problem which could not be solved with previous post training optimization techniques which only learn to round up or down weight values in order to preserve the predictive function. We circumvent this limitation with a new paradigm: learning new quantized weights over the entire quantized space. Similarly, we enable the optimization of the power exponent, i.e. the optimization of the quantization operator itself during training by alleviating all the numerical instabilities. The resulting predictive function is compatible with integer-only low-bit inference. We show the ability of the method to achieve state-of-the-art compression rates in both, data-free and data-driven configurations.
    RTLLM: An Open-Source Benchmark for Design RTL Generation with Large Language Model. (arXiv:2308.05345v1 [cs.LG])
    Inspired by the recent success of large language models (LLMs) like ChatGPT, researchers start to explore the adoption of LLMs for agile hardware design, such as generating design RTL based on natural-language instructions. However, in existing works, their target designs are all relatively simple and in a small scale, and proposed by the authors themselves, making a fair comparison among different LLM solutions challenging. In addition, many prior works only focus on the design correctness, without evaluating the design qualities of generated design RTL. In this work, we propose an open-source benchmark named RTLLM, for generating design RTL with natural language instructions. To systematically evaluate the auto-generated design RTL, we summarized three progressive goals, named syntax goal, functionality goal, and design quality goal. This benchmark can automatically provide a quantitative evaluation of any given LLM-based solution. Furthermore, we propose an easy-to-use yet surprisingly effective prompt engineering technique named self-planning, which proves to significantly boost the performance of GPT-3.5 in our proposed benchmark.
    A Forecaster's Review of Judea Pearl's Causality: Models, Reasoning and Inference, Second Edition, 2009. (arXiv:2308.05451v1 [stat.ME])
    With the big popularity and success of Judea Pearl's original causality book, this review covers the main topics updated in the second edition in 2009 and illustrates an easy-to-follow causal inference strategy in a forecast scenario. It further discusses some potential benefits and challenges for causal inference with time series forecasting when modeling the counterfactuals, estimating the uncertainty and incorporating prior knowledge to estimate causal effects in different forecasting scenarios.
    AI-GOMS: Large AI-Driven Global Ocean Modeling System. (arXiv:2308.03152v2 [physics.ao-ph] UPDATED)
    Ocean modeling is a powerful tool for simulating the physical, chemical, and biological processes of the ocean, which is the foundation for marine science research and operational oceanography. Modern numerical ocean modeling mainly consists of governing equations and numerical algorithms. Nonlinear instability, computational expense, low reusability efficiency and high coupling costs have gradually become the main bottlenecks for the further development of numerical ocean modeling. Recently, artificial intelligence-based modeling in scientific computing has shown revolutionary potential for digital twins and scientific simulations, but the bottlenecks of numerical ocean modeling have not been further solved. Here, we present AI-GOMS, a large AI-driven global ocean modeling system, for accurate and efficient global ocean daily prediction. AI-GOMS consists of a backbone model with the Fourier-based Masked Autoencoder structure for basic ocean variable prediction and lightweight fine-tuning models incorporating regional downscaling, wave decoding, and biochemistry coupling modules. AI-GOMS has achieved the best performance in 30 days of prediction for the global ocean basic variables with 15 depth layers at 1/4{\deg} spatial resolution. Beyond the good performance in statistical metrics, AI-GOMS realizes the simulation of mesoscale eddies in the Kuroshio region at 1/12{\deg} spatial resolution and ocean stratification in the tropical Pacific Ocean. AI-GOMS provides a new backbone-downstream paradigm for Earth system modeling, which makes the system transferable, scalable and reusable.
    Investigating disaster response through social media data and the Susceptible-Infected-Recovered (SIR) model: A case study of 2020 Western U.S. wildfire season. (arXiv:2308.05281v1 [cs.SI])
    Effective disaster response is critical for affected communities. Responders and decision-makers would benefit from reliable, timely measures of the issues impacting their communities during a disaster, and social media offers a potentially rich data source. Social media can reflect public concerns and demands during a disaster, offering valuable insights for decision-makers to understand evolving situations and optimize resource allocation. We used Bidirectional Encoder Representations from Transformers (BERT) topic modeling to cluster topics from Twitter data. Then, we conducted a temporal-spatial analysis to examine the distribution of these topics across different regions during the 2020 western U.S. wildfire season. Our results show that Twitter users mainly focused on three topics:"health impact," "damage," and "evacuation." We used the Susceptible-Infected-Recovered (SIR) theory to explore the magnitude and velocity of topic diffusion on Twitter. The results displayed a clear relationship between topic trends and wildfire propagation patterns. The estimated parameters obtained from the SIR model in selected cities revealed that residents exhibited a high level of several concerns during the wildfire. Our study details how the SIR model and topic modeling using social media data can provide decision-makers with a quantitative approach to measure disaster response and support their decision-making processes.
    Preemptive Detection of Fake Accounts on Social Networks via Multi-Class Preferential Attachment Classifiers. (arXiv:2308.05353v1 [cs.SI])
    In this paper, we describe a new algorithm called Preferential Attachment k-class Classifier (PreAttacK) for detecting fake accounts in a social network. Recently, several algorithms have obtained high accuracy on this problem. However, they have done so by relying on information about fake accounts' friendships or the content they share with others--the very things we seek to prevent. PreAttacK represents a significant departure from these approaches. We provide some of the first detailed distributional analyses of how new fake (and real) accounts first attempt to request friends after joining a major network (Facebook). We show that even before a new account has made friends or shared content, these initial friend request behaviors evoke a natural multi-class extension of the canonical Preferential Attachment model of social network growth. We use this model to derive a new algorithm, PreAttacK. We prove that in relevant problem instances, PreAttacK near-optimally approximates the posterior probability that a new account is fake under this multi-class Preferential Attachment model of new accounts' (not-yet-answered) friend requests. These are the first provable guarantees for fake account detection that apply to new users, and that do not require strong homophily assumptions. This principled approach also makes PreAttacK the only algorithm with provable guarantees that obtains state-of-the-art performance on new users on the global Facebook network, where it converges to AUC=0.9 after new users send + receive a total of just 20 not-yet-answered friend requests. For comparison, state-of-the-art benchmarks do not obtain this AUC even after observing additional data on new users' first 100 friend requests. Thus, unlike mainstream algorithms, PreAttacK converges before the median new fake account has made a single friendship (accepted friend request) with a human.
    Decoding Layer Saliency in Language Transformers. (arXiv:2308.05219v1 [cs.CL])
    In this paper, we introduce a strategy for identifying textual saliency in large-scale language models applied to classification tasks. In visual networks where saliency is more well-studied, saliency is naturally localized through the convolutional layers of the network; however, the same is not true in modern transformer-stack networks used to process natural language. We adapt gradient-based saliency methods for these networks, propose a method for evaluating the degree of semantic coherence of each layer, and demonstrate consistent improvement over numerous other methods for textual saliency on multiple benchmark classification datasets. Our approach requires no additional training or access to labelled data, and is comparatively very computationally efficient.
    Follow Anything: Open-set detection, tracking, and following in real-time. (arXiv:2308.05737v1 [cs.RO])
    Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial automation to logistics and warehousing, to healthcare and security. In this paper, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed ``follow anything'' (FAn), is an open-vocabulary and multimodal model -- it is not restricted to concepts seen at training time and can be applied to novel classes at inference time using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models (foundation models), FAn can detect and segment objects by matching multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion and object re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial vehicle) and report its ability to seamlessly follow the objects of interest in a real-time control loop. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we open-source all our code on our project webpage at https://github.com/alaamaalouf/FollowAnything . We also encourage the reader the watch our 5-minutes explainer video in this https://www.youtube.com/watch?v=6Mgt3EPytrw .
    Learning ground states of gapped quantum Hamiltonians with Kernel Methods. (arXiv:2303.08902v2 [quant-ph] UPDATED)
    Neural network approaches to approximate the ground state of quantum hamiltonians require the numerical solution of a highly nonlinear optimization problem. We introduce a statistical learning approach that makes the optimization trivial by using kernel methods. Our scheme is an approximate realization of the power method, where supervised learning is used to learn the next step of the power iteration. We show that the ground state properties of arbitrary gapped quantum hamiltonians can be reached with polynomial resources under the assumption that the supervised learning is efficient. Using kernel ridge regression, we provide numerical evidence that the learning assumption is verified by applying our scheme to find the ground states of several prototypical interacting many-body quantum systems, both in one and two dimensions, showing the flexibility of our approach.
    On the Optimal Expressive Power of ReLU DNNs and Its Application in Approximation with Kolmogorov Superposition Theorem. (arXiv:2308.05509v1 [cs.LG])
    This paper is devoted to studying the optimal expressive power of ReLU deep neural networks (DNNs) and its application in approximation via the Kolmogorov Superposition Theorem. We first constructively prove that any continuous piecewise linear functions on $[0,1]$, comprising $O(N^2L)$ segments, can be represented by ReLU DNNs with $L$ hidden layers and $N$ neurons per layer. Subsequently, we demonstrate that this construction is optimal regarding the parameter count of the DNNs, achieved through investigating the shattering capacity of ReLU DNNs. Moreover, by invoking the Kolmogorov Superposition Theorem, we achieve an enhanced approximation rate for ReLU DNNs of arbitrary width and depth when dealing with continuous functions in high-dimensional spaces.
    Synthesizing Mixed-type Electronic Health Records using Diffusion Models. (arXiv:2302.14679v2 [cs.LG] UPDATED)
    Electronic Health Records (EHRs) contain sensitive patient information, which presents privacy concerns when sharing such data. Synthetic data generation is a promising solution to mitigate these risks, often relying on deep generative models such as Generative Adversarial Networks (GANs). However, recent studies have shown that diffusion models offer several advantages over GANs, such as generation of more realistic synthetic data and stable training in generating data modalities, including image, text, and sound. In this work, we investigate the potential of diffusion models for generating realistic mixed-type tabular EHRs, comparing TabDDPM model with existing methods on four datasets in terms of data quality, utility, privacy, and augmentation. Our experiments demonstrate that TabDDPM outperforms the state-of-the-art models across all evaluation metrics, except for privacy, which confirms the trade-off between privacy and utility.
    Provably Efficient Algorithm for Nonstationary Low-Rank MDPs. (arXiv:2308.05471v1 [cs.LG])
    Reinforcement learning (RL) under changing environment models many real-world applications via nonstationary Markov Decision Processes (MDPs), and hence gains considerable interest. However, theoretical studies on nonstationary MDPs in the literature have mainly focused on tabular and linear (mixture) MDPs, which do not capture the nature of unknown representation in deep RL. In this paper, we make the first effort to investigate nonstationary RL under episodic low-rank MDPs, where both transition kernels and rewards may vary over time, and the low-rank model contains unknown representation in addition to the linear state embedding function. We first propose a parameter-dependent policy optimization algorithm called PORTAL, and further improve PORTAL to its parameter-free version of Ada-PORTAL, which is able to tune its hyper-parameters adaptively without any prior knowledge of nonstationarity. For both algorithms, we provide upper bounds on the average dynamic suboptimality gap, which show that as long as the nonstationarity is not significantly large, PORTAL and Ada-PORTAL are sample-efficient and can achieve arbitrarily small average dynamic suboptimality gap with polynomial sample complexity.
    Shadow Datasets, New challenging datasets for Causal Representation Learning. (arXiv:2308.05707v1 [cs.LG])
    Discovering causal relations among semantic factors is an emergent topic in representation learning. Most causal representation learning (CRL) methods are fully supervised, which is impractical due to costly labeling. To resolve this restriction, weakly supervised CRL methods were introduced. To evaluate CRL performance, four existing datasets, Pendulum, Flow, CelebA(BEARD) and CelebA(SMILE), are utilized. However, existing CRL datasets are limited to simple graphs with few generative factors. Thus we propose two new datasets with a larger number of diverse generative factors and more sophisticated causal graphs. In addition, current real datasets, CelebA(BEARD) and CelebA(SMILE), the originally proposed causal graphs are not aligned with the dataset distributions. Thus, we propose modifications to them.
    Forecasting Irregularly Sampled Time Series using Graphs. (arXiv:2305.12932v2 [cs.LG] UPDATED)
    Forecasting irregularly sampled time series with missing values is a crucial task for numerous real-world applications such as healthcare, astronomy, and climate sciences. State-of-the-art approaches to this problem rely on Ordinary Differential Equations (ODEs) which are known to be slow and often require additional features to handle missing values. To address this issue, we propose a novel model using Graphs for Forecasting Irregularly Sampled Time Series with missing values which we call GraFITi. GraFITi first converts the time series to a Sparsity Structure Graph which is a sparse bipartite graph, and then reformulates the forecasting problem as the edge weight prediction task in the graph. It uses the power of Graph Neural Networks to learn the graph and predict the target edge weights. GraFITi has been tested on 3 real-world and 1 synthetic irregularly sampled time series dataset with missing values and compared with various state-of-the-art models. The experimental results demonstrate that GraFITi improves the forecasting accuracy by up to 17% and reduces the run time up to 5 times compared to the state-of-the-art forecasting models.
    AST-MHSA : Code Summarization using Multi-Head Self-Attention. (arXiv:2308.05646v1 [cs.CL])
    Code summarization aims to generate concise natural language descriptions for source code. The prevailing approaches adopt transformer-based encoder-decoder architectures, where the Abstract Syntax Tree (AST) of the source code is utilized for encoding structural information. However, ASTs are much longer than the corresponding source code, and existing methods ignore this size constraint by directly feeding the entire linearized AST into the encoders. This simplistic approach makes it challenging to extract truly valuable dependency relations from the overlong input sequence and leads to significant computational overhead due to self-attention applied to all nodes in the AST. To address this issue effectively and efficiently, we present a model, AST-MHSA that uses multi-head attention to extract the important semantic information from the AST. The model consists of two main components: an encoder and a decoder. The encoder takes as input the abstract syntax tree (AST) of the code and generates a sequence of hidden states. The decoder then takes these hidden states as input and generates a natural language summary of the code. The multi-head attention mechanism allows the model to learn different representations of the input code, which can be combined to generate a more comprehensive summary. The model is trained on a dataset of code and summaries, and the parameters of the model are optimized to minimize the loss between the generated summaries and the ground-truth summaries.
    PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers. (arXiv:2308.05732v1 [cs.LG])
    Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.
    FALL-E: A Foley Sound Synthesis Model and Strategies. (arXiv:2306.09807v2 [eess.AS] UPDATED)
    This paper introduces FALL-E, a foley synthesis system and its training/inference strategies. The FALL-E model employs a cascaded approach comprising low-resolution spectrogram generation, spectrogram super-resolution, and a vocoder. We trained every sound-related model from scratch using our extensive datasets, and utilized a pre-trained language model. We conditioned the model with dataset-specific texts, enabling it to learn sound quality and recording environment based on text input. Moreover, we leveraged external language models to improve text descriptions of our datasets and performed prompt engineering for quality, coherence, and diversity. FALL-E was evaluated by an objective measure as well as listening tests in the DCASE 2023 challenge Task 7. The submission achieved the second place on average, while achieving the best score for diversity, second place for audio quality, and third place for class fitness.  ( 2 min )
    RALACs: Action Recognition in Autonomous Vehicles using Interaction Encoding and Optical Flow. (arXiv:2209.14408v2 [cs.CV] UPDATED)
    When applied to autonomous vehicle (AV) settings, action recognition can enhance an environment model's situational awareness. This is especially prevalent in scenarios where traditional geometric descriptions and heuristics in AVs are insufficient. However, action recognition has traditionally been studied for humans, and its limited adaptability to noisy, un-clipped, un-pampered, raw RGB data has limited its application in other fields. To push for the advancement and adoption of action recognition into AVs, this work proposes a novel two-stage action recognition system, termed RALACs. RALACs formulates the problem of action recognition for road scenes, and bridges the gap between it and the established field of human action recognition. This work shows how attention layers can be useful for encoding the relations across agents, and stresses how such a scheme can be class-agnostic. Furthermore, to address the dynamic nature of agents on the road, RALACs constructs a novel approach to adapting Region of Interest (ROI) Alignment to agent tracks for downstream action classification. Finally, our scheme also considers the problem of active agent detection, and utilizes a novel application of fusing optical flow maps to discern relevant agents in a road scene. We show that our proposed scheme can outperform the baseline on the ICCV2021 Road Challenge dataset and by deploying it on a real vehicle platform, we provide preliminary insight to the usefulness of action recognition in decision making.  ( 3 min )
    Overlooked Implications of the Reconstruction Loss for VAE Disentanglement. (arXiv:2202.13341v3 [cs.LG] UPDATED)
    Learning disentangled representations with variational autoencoders (VAEs) is often attributed to the regularisation component of the loss. In this work, we highlight the interaction between data and the reconstruction term of the loss as the main contributor to disentanglement in VAEs. We show that standard benchmark datasets have unintended correlations between their subjective ground-truth factors and perceived axes in the data according to typical VAE reconstruction losses. Our work exploits this relationship to provide a theory for what constitutes an adversarial dataset under a given reconstruction loss. We verify this by constructing an example dataset that prevents disentanglement in state-of-the-art frameworks while maintaining human-intuitive ground-truth factors. Finally, we re-enable disentanglement by designing an example reconstruction loss that is once again able to perceive the ground-truth factors. Our findings demonstrate the subjective nature of disentanglement and the importance of considering the interaction between the ground-truth factors, data and notably, the reconstruction loss, which is under-recognised in the literature.  ( 2 min )
    Financial Fraud Detection: A Comparative Study of Quantum Machine Learning Models. (arXiv:2308.05237v1 [quant-ph])
    In this research, a comparative study of four Quantum Machine Learning (QML) models was conducted for fraud detection in finance. We proved that the Quantum Support Vector Classifier model achieved the highest performance, with F1 scores of 0.98 for fraud and non-fraud classes. Other models like the Variational Quantum Classifier, Estimator Quantum Neural Network (QNN), and Sampler QNN demonstrate promising results, propelling the potential of QML classification for financial applications. While they exhibit certain limitations, the insights attained pave the way for future enhancements and optimisation strategies. However, challenges exist, including the need for more efficient Quantum algorithms and larger and more complex datasets. The article provides solutions to overcome current limitations and contributes new insights to the field of Quantum Machine Learning in fraud detection, with important implications for its future development.  ( 2 min )
    Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping. (arXiv:2308.05235v1 [cs.CV])
    Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the available training data are limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP  ( 2 min )
    Comparative Analysis of Epileptic Seizure Prediction: Exploring Diverse Pre-Processing Techniques and Machine Learning Models. (arXiv:2308.05176v1 [eess.SP])
    Epilepsy is a prevalent neurological disorder characterized by recurrent and unpredictable seizures, necessitating accurate prediction for effective management and patient care. Application of machine learning (ML) on electroencephalogram (EEG) recordings, along with its ability to provide valuable insights into brain activity during seizures, is able to make accurate and robust seizure prediction an indispensable component in relevant studies. In this research, we present a comprehensive comparative analysis of five machine learning models - Random Forest (RF), Decision Tree (DT), Extra Trees (ET), Logistic Regression (LR), and Gradient Boosting (GB) - for the prediction of epileptic seizures using EEG data. The dataset underwent meticulous preprocessing, including cleaning, normalization, outlier handling, and oversampling, ensuring data quality and facilitating accurate model training. These preprocessing techniques played a crucial role in enhancing the models' performance. The results of our analysis demonstrate the performance of each model in terms of accuracy. The LR classifier achieved an accuracy of 56.95%, while GB and DT both attained 97.17% accuracy. RT achieved a higher accuracy of 98.99%, while the ET model exhibited the best performance with an accuracy of 99.29%. Our findings reveal that the ET model outperformed not only the other models in the comparative analysis but also surpassed the state-of-the-art results from previous research. The superior performance of the ET model makes it a compelling choice for accurate and robust epileptic seizure prediction using EEG data.  ( 3 min )
    ReLU and Addition-based Gated RNN. (arXiv:2308.05629v1 [cs.LG])
    We replace the multiplication and sigmoid function of the conventional recurrent gate with addition and ReLU activation. This mechanism is designed to maintain long-term memory for sequence processing but at a reduced computational cost, thereby opening up for more efficient execution or larger models on restricted hardware. Recurrent Neural Networks (RNNs) with gating mechanisms such as LSTM and GRU have been widely successful in learning from sequential data due to their ability to capture long-term dependencies. Conventionally, the update based on current inputs and the previous state history is each multiplied with dynamic weights and combined to compute the next state. However, multiplication can be computationally expensive, especially for certain hardware architectures or alternative arithmetic systems such as homomorphic encryption. It is demonstrated that the novel gating mechanism can capture long-term dependencies for a standard synthetic sequence learning task while significantly reducing computational costs such that execution time is reduced by half on CPU and by one-third under encryption. Experimental results on handwritten text recognition tasks furthermore show that the proposed architecture can be trained to achieve comparable accuracy to conventional GRU and LSTM baselines. The gating mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the multiplication of encrypted variables. It can also support quantization in (unencrypted) plaintext applications, with the potential for substantial performance gains since the addition-based formulation can avoid the expansion to double precision often required for multiplication.  ( 2 min )
    Analyzing the Effect of Data Impurity on the Detection Performances of Mental Disorders. (arXiv:2308.05133v1 [q-bio.NC])
    The primary method for identifying mental disorders automatically has traditionally involved using binary classifiers. These classifiers are trained using behavioral data obtained from an interview setup. In this training process, data from individuals with the specific disorder under consideration are categorized as the positive class, while data from all other participants constitute the negative class. In practice, it is widely recognized that certain mental disorders share similar symptoms, causing the collected behavioral data to encompass a variety of attributes associated with multiple disorders. Consequently, attributes linked to the targeted mental disorder might also be present within the negative class. This data impurity may lead to sub-optimal training of the classifier for a mental disorder of interest. In this study, we investigate this hypothesis in the context of major depressive disorder (MDD) and post-traumatic stress disorder detection (PTSD). The results show that upon removal of such data impurity, MDD and PTSD detection performances are significantly improved.  ( 2 min )
    Data-Free Model Extraction Attacks in the Context of Object Detection. (arXiv:2308.05127v1 [cs.CR])
    A significant number of machine learning models are vulnerable to model extraction attacks, which focus on stealing the models by using specially curated queries against the target model. This task is well accomplished by using part of the training data or a surrogate dataset to train a new model that mimics a target model in a white-box environment. In pragmatic situations, however, the target models are trained on private datasets that are inaccessible to the adversary. The data-free model extraction technique replaces this problem when it comes to using queries artificially curated by a generator similar to that used in Generative Adversarial Nets. We propose for the first time, to the best of our knowledge, an adversary black box attack extending to a regression problem for predicting bounding box coordinates in object detection. As part of our study, we found that defining a loss function and using a novel generator setup is one of the key aspects in extracting the target model. We find that the proposed model extraction method achieves significant results by using reasonable queries. The discovery of this object detection vulnerability will support future prospects for securing such models.  ( 2 min )
    Can Attention Be Used to Explain EHR-Based Mortality Prediction Tasks: A Case Study on Hemorrhagic Stroke. (arXiv:2308.05110v1 [cs.LG])
    Stroke is a significant cause of mortality and morbidity, necessitating early predictive strategies to minimize risks. Traditional methods for evaluating patients, such as Acute Physiology and Chronic Health Evaluation (APACHE II, IV) and Simplified Acute Physiology Score III (SAPS III), have limited accuracy and interpretability. This paper proposes a novel approach: an interpretable, attention-based transformer model for early stroke mortality prediction. This model seeks to address the limitations of previous predictive models, providing both interpretability (providing clear, understandable explanations of the model) and fidelity (giving a truthful explanation of the model's dynamics from input to output). Furthermore, the study explores and compares fidelity and interpretability scores using Shapley values and attention-based scores to improve model explainability. The research objectives include designing an interpretable attention-based transformer model, evaluating its performance compared to existing models, and providing feature importance derived from the model.  ( 2 min )
    Symmetry Defense Against XGBoost Adversarial Perturbation Attacks. (arXiv:2308.05575v1 [cs.LG])
    We examine whether symmetry can be used to defend tree-based ensemble classifiers such as gradient-boosting decision trees (GBDTs) against adversarial perturbation attacks. The idea is based on a recent symmetry defense for convolutional neural network classifiers (CNNs) that utilizes CNNs' lack of invariance with respect to symmetries. CNNs lack invariance because they can classify a symmetric sample, such as a horizontally flipped image, differently from the original sample. CNNs' lack of invariance also means that CNNs can classify symmetric adversarial samples differently from the incorrect classification of adversarial samples. Using CNNs' lack of invariance, the recent CNN symmetry defense has shown that the classification of symmetric adversarial samples reverts to the correct sample classification. In order to apply the same symmetry defense to GBDTs, we examine GBDT invariance and are the first to show that GBDTs also lack invariance with respect to symmetries. We apply and evaluate the GBDT symmetry defense for nine datasets against six perturbation attacks with a threat model that ranges from zero-knowledge to perfect-knowledge adversaries. Using the feature inversion symmetry against zero-knowledge adversaries, we achieve up to 100% accuracy on adversarial samples even when default and robust classifiers have 0% accuracy. Using the feature inversion and horizontal flip symmetries against perfect-knowledge adversaries, we achieve up to over 95% accuracy on adversarial samples for the GBDT classifier of the F-MNIST dataset even when default and robust classifiers have 0% accuracy.  ( 2 min )
    Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures. (arXiv:2308.05106v1 [cs.CV])
    This paper presents an investigation into machine learning techniques for violence detection in videos and their adaptation to a federated learning context. The study includes experiments with spatio-temporal features extracted from benchmark video datasets, comparison of different methods, and proposal of a modified version of the "Flow-Gated" architecture called "Diff-Gated." Additionally, various machine learning techniques, including super-convergence and transfer learning, are explored, and a method for adapting centralized datasets to a federated learning context is developed. The research achieves better accuracy results compared to state-of-the-art models by training the best violence detection model in a federated learning context.  ( 2 min )
    Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators. (arXiv:2308.05141v1 [cs.SD])
    We address the challenge of sound propagation simulations in $3$D virtual rooms with moving sources, which have applications in virtual/augmented reality, game audio, and spatial computing. Solutions to the wave equation can describe wave phenomena such as diffraction and interference. However, simulating them using conventional numerical discretization methods with hundreds of source and receiver positions is intractable, making stimulating a sound field with moving sources impractical. To overcome this limitation, we propose using deep operator networks to approximate linear wave-equation operators. This enables the rapid prediction of sound propagation in realistic 3D acoustic scenes with moving sources, achieving millisecond-scale computations. By learning a compact surrogate model, we avoid the offline calculation and storage of impulse responses for all relevant source/listener pairs. Our experiments, including various complex scene geometries, show good agreement with reference solutions, with root mean squared errors ranging from 0.02 Pa to 0.10 Pa. Notably, our method signifies a paradigm shift as no prior machine learning approach has achieved precise predictions of complete wave fields within realistic domains. We anticipate that our findings will drive further exploration of deep neural operator methods, advancing research in immersive user experiences within virtual environments.  ( 2 min )
    Vector Embeddings by Sequence Similarity and Context for Improved Compression, Similarity Search, Clustering, Organization, and Manipulation of cDNA Libraries. (arXiv:2308.05118v1 [q-bio.GN])
    This paper demonstrates the utility of organized numerical representations of genes in research involving flat string gene formats (i.e., FASTA/FASTQ5). FASTA/FASTQ files have several current limitations, such as their large file sizes, slow processing speeds for mapping and alignment, and contextual dependencies. These challenges significantly hinder investigations and tasks that involve finding similar sequences. The solution lies in transforming sequences into an alternative representation that facilitates easier clustering into similar groups compared to the raw sequences themselves. By assigning a unique vector embedding to each short sequence, it is possible to more efficiently cluster and improve upon compression performance for the string representations of cDNA libraries. Furthermore, through learning alternative coordinate vector embeddings based on the contexts of codon triplets, we can demonstrate clustering based on amino acid properties. Finally, using this sequence embedding method to encode barcodes and cDNA sequences, we can improve the time complexity of the similarity search by coupling vector embeddings with an algorithm that determines the proximity of vectors in Euclidean space; this allows us to perform sequence similarity searches in a quicker and more modular fashion.  ( 2 min )
    Copy Number Variation Informs fMRI-based Prediction of Autism Spectrum Disorder. (arXiv:2308.05122v1 [q-bio.QM])
    The multifactorial etiology of autism spectrum disorder (ASD) suggests that its study would benefit greatly from multimodal approaches that combine data from widely varying platforms, e.g., neuroimaging, genetics, and clinical characterization. Prior neuroimaging-genetic analyses often apply naive feature concatenation approaches in data-driven work or use the findings from one modality to guide posthoc analysis of another, missing the opportunity to analyze the paired multimodal data in a truly unified approach. In this paper, we develop a more integrative model for combining genetic, demographic, and neuroimaging data. Inspired by the influence of genotype on phenotype, we propose using an attention-based approach where the genetic data guides attention to neuroimaging features of importance for model prediction. The genetic data is derived from copy number variation parameters, while the neuroimaging data is from functional magnetic resonance imaging. We evaluate the proposed approach on ASD classification and severity prediction tasks, using a sex-balanced dataset of 228 ASD and typically developing subjects in a 10-fold cross-validation framework. We demonstrate that our attention-based model combining genetic information, demographic data, and functional magnetic resonance imaging results in superior prediction performance compared to other multimodal approaches.  ( 2 min )
    Deep Learning for Morphological Identification of Extended Radio Galaxies using Weak Labels. (arXiv:2308.05166v1 [astro-ph.IM])
    The present work discusses the use of a weakly-supervised deep learning algorithm that reduces the cost of labelling pixel-level masks for complex radio galaxies with multiple components. The algorithm is trained on weak class-level labels of radio galaxies to get class activation maps (CAMs). The CAMs are further refined using an inter-pixel relations network (IRNet) to get instance segmentation masks over radio galaxies and the positions of their infrared hosts. We use data from the Australian Square Kilometre Array Pathfinder (ASKAP) telescope, specifically the Evolutionary Map of the Universe (EMU) Pilot Survey, which covered a sky area of 270 square degrees with an RMS sensitivity of 25-35 $\mu$Jy/beam. We demonstrate that weakly-supervised deep learning algorithms can achieve high accuracy in predicting pixel-level information, including masks for the extended radio emission encapsulating all galaxy components and the positions of the infrared host galaxies. We evaluate the performance of our method using mean Average Precision (mAP) across multiple classes at a standard intersection over union (IoU) threshold of 0.5. We show that the model achieves a mAP$_{50}$ of 67.5\% and 76.8\% for radio masks and infrared host positions, respectively. The network architecture can be found at the following link: https://github.com/Nikhel1/Gal-CAM  ( 3 min )
    Two Novel Approaches to Detect Community: A Case Study of Omicron Lineage Variants PPI Network. (arXiv:2308.05125v1 [q-bio.MN])
    The capacity to identify and analyze protein-protein interactions, along with their internal modular organization, plays a crucial role in comprehending the intricate mechanisms underlying biological processes at the molecular level. We can learn a lot about the structure and dynamics of these interactions by using network analysis. We can improve our understanding of the biological roots of disease pathogenesis by recognizing network communities. This knowledge, in turn, holds significant potential for driving advancements in drug discovery and facilitating personalized medicine approaches for disease treatment. In this study, we aimed to uncover the communities within the variant B.1.1.529 (Omicron virus) using two proposed novel algorithm (ABCDE and ALCDE) and four widely recognized algorithms: Girvan-Newman, Louvain, Leiden, and Label Propagation algorithm. Each of these algorithms has established prominence in the field and offers unique perspectives on identifying communities within complex networks. We also compare the networks by the global properties, statistic summary, subgraph count, graphlet and validate by the modulaity. By employing these approaches, we sought to gain deeper insights into the structural organization and interconnections present within the Omicron virus network.  ( 2 min )
    PTransIPs: Identification of phosphorylation sites based on protein pretrained language model and Transformer. (arXiv:2308.05115v1 [q-bio.QM])
    Phosphorylation is central to numerous fundamental cellular processes, influencing the onset and progression of a variety of diseases. Identification of phosphorylation sites is thus an important step for understanding the molecular mechanisms of cells and virus infection, which potentially leads to new therapeutic targets. In this study, we present PTransIPs, a novel deep learning model for the identification of phosphorylation sites. PTransIPs treats amino acids in protein sequences as words in natural language, extracting unique encodings based on the types along with position of amino acids in the sequence. It also incorporates embeddings from large pre-trained protein models as additional data inputs. PTransIPS is further trained on a combination model of convolutional neural network with residual connections and Transformer model equipped with multi-head attention mechanisms. At last, the model outputs classification results through a fully connected layer. The results of independent testing reveal that PTransIPs outperforms existing state-of-the-art methodologies, achieving AUROCs of 0.9232 and 0.9660 for identifying phosphorylated S/T and Y sites respectively. In addition, ablation studies prove that pretrained model embeddings contribute to the performance of PTransIPs. Furthermore, PTransIPs has interpretable amino acid preference, visible training process and shows generalizability on other bioactivity classification tasks. To facilitate usage, our code and data are publicly accessible at \url{https://github.com/StatXzy7/PTransIPs}.  ( 2 min )
    Are Sex-based Physiological Differences the Cause of Gender Bias for Chest X-ray Diagnosis?. (arXiv:2308.05129v1 [eess.IV])
    While many studies have assessed the fairness of AI algorithms in the medical field, the causes of differences in prediction performance are often unknown. This lack of knowledge about the causes of bias hampers the efficacy of bias mitigation, as evidenced by the fact that simple dataset balancing still often performs best in reducing performance gaps but is unable to resolve all performance differences. In this work, we investigate the causes of gender bias in machine learning-based chest X-ray diagnosis. In particular, we explore the hypothesis that breast tissue leads to underexposure of the lungs and causes lower model performance. Methodologically, we propose a new sampling method which addresses the highly skewed distribution of recordings per patient in two widely used public datasets, while at the same time reducing the impact of label errors. Our comprehensive analysis of gender differences across diseases, datasets, and gender representations in the training set shows that dataset imbalance is not the sole cause of performance differences. Moreover, relative group performance differs strongly between datasets, indicating important dataset-specific factors influencing male/female group performance. Finally, we investigate the effect of breast tissue more specifically, by cropping out the breasts from recordings, finding that this does not resolve the observed performance gaps. In conclusion, our results indicate that dataset-specific factors, not fundamental physiological differences, are the main drivers of male--female performance gaps in chest X-ray analyses on widely used NIH and CheXpert Dataset.  ( 3 min )
    Dynamic Model Agnostic Reliability Evaluation of Machine-Learning Methods Integrated in Instrumentation & Control Systems. (arXiv:2308.05120v1 [cs.LG])
    In recent years, the field of data-driven neural network-based machine learning (ML) algorithms has grown significantly and spurred research in its applicability to instrumentation and control systems. While they are promising in operational contexts, the trustworthiness of such algorithms is not adequately assessed. Failures of ML-integrated systems are poorly understood; the lack of comprehensive risk modeling can degrade the trustworthiness of these systems. In recent reports by the National Institute for Standards and Technology, trustworthiness in ML is a critical barrier to adoption and will play a vital role in intelligent systems' safe and accountable operation. Thus, in this work, we demonstrate a real-time model-agnostic method to evaluate the relative reliability of ML predictions by incorporating out-of-distribution detection on the training dataset. It is well documented that ML algorithms excel at interpolation (or near-interpolation) tasks but significantly degrade at extrapolation. This occurs when new samples are "far" from training samples. The method, referred to as the Laplacian distributed decay for reliability (LADDR), determines the difference between the operational and training datasets, which is used to calculate a prediction's relative reliability. LADDR is demonstrated on a feedforward neural network-based model used to predict safety significant factors during different loss-of-flow transients. LADDR is intended as a "data supervisor" and determines the appropriateness of well-trained ML models in the context of operational conditions. Ultimately, LADDR illustrates how training data can be used as evidence to support the trustworthiness of ML predictions when utilized for conventional interpolation tasks.  ( 3 min )
  • Open

    SLEM: Machine Learning for Path Modeling and Causal Inference with Super Learner Equation Modeling. (arXiv:2308.04365v3 [stat.ML] UPDATED)
    Causal inference is a crucial goal of science, enabling researchers to arrive at meaningful conclusions regarding the predictions of hypothetical interventions using observational data. Path models, Structural Equation Models (SEMs), and, more generally, Directed Acyclic Graphs (DAGs), provide a means to unambiguously specify assumptions regarding the causal structure underlying a phenomenon. Unlike DAGs, which make very few assumptions about the functional and parametric form, SEM assumes linearity. This can result in functional misspecification which prevents researchers from undertaking reliable effect size estimation. In contrast, we propose Super Learner Equation Modeling, a path modeling technique integrating machine learning Super Learner ensembles. We empirically demonstrate its ability to provide consistent and unbiased estimates of causal effects, its competitive performance for linear models when compared with SEM, and highlight its superiority over SEM when dealing with non-linear relationships. We provide open-source code, and a tutorial notebook with example usage, accentuating the easy-to-use nature of the method.
    Automatic Extraction of Relevant Road Infrastructure using Connected vehicle data and Deep Learning Model. (arXiv:2308.05658v1 [cs.AI])
    In today's rapidly evolving urban landscapes, efficient and accurate mapping of road infrastructure is critical for optimizing transportation systems, enhancing road safety, and improving the overall mobility experience for drivers and commuters. Yet, a formidable bottleneck obstructs progress - the laborious and time-intensive manual identification of intersections. Simply considering the shear number of intersections that need to be identified, and the labor hours required per intersection, the need for an automated solution becomes undeniable. To address this challenge, we propose a novel approach that leverages connected vehicle data and cutting-edge deep learning techniques. By employing geohashing to segment vehicle trajectories and then generating image representations of road segments, we utilize the YOLOv5 (You Only Look Once version 5) algorithm for accurate classification of both straight road segments and intersections. Experimental results demonstrate an impressive overall classification accuracy of 95%, with straight roads achieving a remarkable 97% F1 score and intersections reaching a 90% F1 score. This approach not only saves time and resources but also enables more frequent updates and a comprehensive understanding of the road network. Our research showcases the potential impact on traffic management, urban planning, and autonomous vehicle navigation systems. The fusion of connected vehicle data and deep learning models holds promise for a transformative shift in road infrastructure mapping, propelling us towards a smarter, safer, and more connected transportation ecosystem.
    A survey of some recent developments in measures of association. (arXiv:2211.04702v2 [stat.ME] UPDATED)
    This paper surveys some recent developments in measures of association related to a new coefficient of correlation introduced by the author. A straightforward extension of this coefficient to standard Borel spaces (which includes all Polish spaces), overlooked in the literature so far, is proposed at the end of the survey.
    Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning. (arXiv:2302.09738v7 [stat.ML] UPDATED)
    Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations. Here, we simplify such difficulties for a class of sparse or structured symmetric positive-definite matrices with the affine-invariant metric. We do so by proposing a generalized version of the Riemannian normal coordinates that dynamically orthonormalizes the metric and locally converts the problem into an unconstrained problem in the Euclidean space. We use our approach to simplify existing approaches for structured covariances and develop matrix-inverse-free $2^\text{nd}$-order optimizers for deep learning with low precision by using only matrix multiplications. Code: https://github.com/yorkerlin/StructuredNGD-DL
    Normalized Gradients for All. (arXiv:2308.05621v1 [cs.LG])
    In this short note, I show how to adapt to H\"{o}lder smoothness using normalized gradients in a black-box way. Moreover, the bound will depend on a novel notion of local H\"{o}lder smoothness. The main idea directly comes from Levy [2017].
    Generative Diffusion Models for Radio Wireless Channel Modelling and Sampling. (arXiv:2308.05583v1 [cs.AI])
    Channel modelling is essential to designing modern wireless communication systems. The increasing complexity of channel modelling and the cost of collecting high-quality wireless channel data have become major challenges. In this paper, we propose a diffusion model based channel sampling approach for rapidly synthesizing channel realizations from limited data. We use a diffusion model with a U Net based architecture operating in the frequency space domain. To evaluate how well the proposed model reproduces the true distribution of channels in the training dataset, two evaluation metrics are used: $i)$ the approximate $2$-Wasserstein distance between real and generated distributions of the normalized power spectrum in the antenna and frequency domains and $ii)$ precision and recall metric for distributions. We show that, compared to existing GAN based approaches which suffer from mode collapse and unstable training, our diffusion based approach trains stably and generates diverse and high-fidelity samples from the true channel distribution. We also show that we can pretrain the model on a simulated urban macro-cellular channel dataset and fine-tune it on a smaller, out-of-distribution urban micro-cellular dataset, therefore showing that it is feasible to model real world channels using limited data with this approach.
    Inverse Extended Kalman Filter -- Part II: Highly Non-Linear and Uncertain Systems. (arXiv:2208.06683v2 [math.OC] UPDATED)
    Counter-adversarial system design problems have lately motivated the development of inverse Bayesian filters. For example, inverse Kalman filter (I-KF) has been recently formulated to estimate the adversary's Kalman-filter-tracked estimates and hence, predict the adversary's future steps. The purpose of this paper and the companion paper (Part I) is to address the inverse filtering problem in non-linear systems by proposing an inverse extended Kalman filter (I-EKF). The companion paper proposed the theory of I-EKF (with and without unknown inputs) and I-KF (with unknown inputs). In this paper, we develop this theory for highly non-linear models, which employ second-order, Gaussian sum, and dithered forward EKFs. In particular, we derive theoretical stability guarantees for the inverse second-order EKF using the bounded non-linearity approach. To address the limitation of the standard I-EKFs that the system model and forward filter are perfectly known to the defender, we propose reproducing kernel Hilbert space-based EKF to learn the unknown system dynamics based on its observations, which can be employed as an inverse filter to infer the adversary's estimate. Numerical experiments demonstrate the state estimation performance of the proposed filters using recursive Cram\'{e}r-Rao lower bound as a benchmark.
    InfoNCE is variational inference in a recognition parameterised model. (arXiv:2107.02495v3 [stat.ML] UPDATED)
    Here, we show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model, the recognition parameterised model (RPM). When we learn the optimal prior, the RPM ELBO becomes equal to the mutual information (MI; up to a constant), establishing a connection to pre-existing self-supervised learning methods such as InfoNCE. However, practical InfoNCE methods do not use the MI as an objective; the MI is invariant to arbitrary invertible transformations, so using an MI objective can lead to highly entangled representations (Tschannen et al., 2019). Instead, the actual InfoNCE objective is a simplified lower bound on the MI which is loose even in the infinite sample limit. Thus, an objective that works (i.e. the actual InfoNCE objective) appears to be motivated as a loose bound on an objective that does not work (i.e. the true MI which gives arbitrarily entangled representations). We give an alternative motivation for the actual InfoNCE objective. In particular, we show that in the infinite sample limit, and for a particular choice of prior, the actual InfoNCE objective is equal to the ELBO (up to a constant); and the ELBO is equal to the marginal likelihood with a deterministic recognition model. Thus, we argue that our VAE perspective gives a better motivation for InfoNCE than MI, as the actual InfoNCE objective is only loosely bounded by the MI, but is equal to the ELBO/marginal likelihood (up to a constant).
    Selective inference using randomized group lasso estimators for general models. (arXiv:2306.13829v2 [stat.ME] UPDATED)
    Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimization problem is studied. The added randomization allows us to construct a post-selection likelihood which we show to be adequate for selective inference when conditioning on the event of the selection of the grouped covariates. This likelihood also provides a selective point estimator, accounting for the selection by the group lasso. Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume. The selective inference method for grouped lasso is illustrated on data from the national health and nutrition examination survey while simulations showcase its behaviour and favorable comparison with other methods.
    Updating Clinical Risk Stratification Models Using Rank-Based Compatibility: Approaches for Evaluating and Optimizing Clinician-Model Team Performance. (arXiv:2308.05619v1 [stat.ML])
    As data shift or new data become available, updating clinical machine learning models may be necessary to maintain or improve performance over time. However, updating a model can introduce compatibility issues when the behavior of the updated model does not align with user expectations, resulting in poor user-model team performance. Existing compatibility measures depend on model decision thresholds, limiting their applicability in settings where models are used to generate rankings based on estimated risk. To address this limitation, we propose a novel rank-based compatibility measure, $C^R$, and a new loss function that aims to optimize discriminative performance while encouraging good compatibility. Applied to a case study in mortality risk stratification leveraging data from MIMIC, our approach yields more compatible models while maintaining discriminative performance compared to existing model selection techniques, with an increase in $C^R$ of $0.019$ ($95\%$ confidence interval: $0.005$, $0.035$). This work provides new tools to analyze and update risk stratification models used in clinical care.
    From Random Search to Bandit Learning in Metric Measure Spaces. (arXiv:2305.11509v4 [cs.LG] UPDATED)
    Random Search is one of the most widely-used method for Hyperparameter Optimization, and is critical to the success of deep learning models. Despite its astonishing performance, little non-heuristic theory has been developed to describe the underlying working mechanism. This paper gives a theoretical accounting of Random Search. We introduce the concept of \emph{scattering dimension} that describes the landscape of the underlying function, and quantifies the performance of random search. We show that, when the environment is noise-free, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s} } \right) $, where $ d_s \ge 0 $ is the scattering dimension of the underlying function. When the observed function values are corrupted by bounded $iid$ noise, the output of random search converges to the optimal value in probability at rate $ \widetilde{\mathcal{O}} \left( \left( \frac{1}{T} \right)^{ \frac{1}{d_s + 1} } \right) $. In addition, based on the principles of random search, we introduce an algorithm, called BLiN-MOS, for Lipschitz bandits in doubling metric spaces that are also endowed with a probability measure, and show that BLiN-MOS achieves a regret rate of order $ \widetilde{\mathcal{O}} \left( T^{ \frac{d_z}{d_z + 1} } \right) $, where $d_z$ is the zooming dimension of the problem instance.
    Functional Neural Networks: Shift invariant models for functional data with applications to EEG classification. (arXiv:2301.05869v2 [cs.LG] UPDATED)
    It is desirable for statistical models to detect signals of interest independently of their position. If the data is generated by some smooth process, this additional structure should be taken into account. We introduce a new class of neural networks that are shift invariant and preserve smoothness of the data: functional neural networks (FNNs). For this, we use methods from functional data analysis (FDA) to extend multi-layer perceptrons and convolutional neural networks to functional data. We propose different model architectures, show that the models outperform a benchmark model from FDA in terms of accuracy and successfully use FNNs to classify electroencephalography (EEG) data.
    Selective Inference for Sparse Multitask Regression with Applications in Neuroimaging. (arXiv:2205.14220v4 [stat.ME] UPDATED)
    Multi-task learning is frequently used to model a set of related response variables from the same set of features, improving predictive performance and modeling accuracy relative to methods that handle each response variable separately. Despite the potential of multi-task learning to yield more powerful inference than single-task alternatives, prior work in this area has largely omitted uncertainty quantification. Our focus in this paper is a common multi-task problem in neuroimaging, where the goal is to understand the relationship between multiple cognitive task scores (or other subject-level assessments) and brain connectome data collected from imaging. We propose a framework for selective inference to address this problem, with the flexibility to: (i) jointly identify the relevant covariates for each task through a sparsity-inducing penalty, and (ii) conduct valid inference in a model based on the estimated sparsity structure. Our framework offers a new conditional procedure for inference, based on a refinement of the selection event that yields a tractable selection-adjusted likelihood. This gives an approximate system of estimating equations for maximum likelihood inference, solvable via a single convex optimization problem, and enables us to efficiently form confidence intervals with approximately the correct coverage. Applied to both simulated data and data from the Adolescent Brain Cognitive Development (ABCD) study, our selective inference methods yield tighter confidence intervals than commonly used alternatives, such as data splitting. We also demonstrate through simulations that multi-task learning with selective inference can more accurately recover true signals than single-task methods.
    TSLiNGAM: DirectLiNGAM under heavy tails. (arXiv:2308.05422v1 [stat.ME])
    One of the established approaches to causal discovery consists of combining directed acyclic graphs (DAGs) with structural causal models (SCMs) to describe the functional dependencies of effects on their causes. Possible identifiability of SCMs given data depends on assumptions made on the noise variables and the functional classes in the SCM. For instance, in the LiNGAM model, the functional class is restricted to linear functions and the disturbances have to be non-Gaussian. In this work, we propose TSLiNGAM, a new method for identifying the DAG of a causal model based on observational data. TSLiNGAM builds on DirectLiNGAM, a popular algorithm which uses simple OLS regression for identifying causal directions between variables. TSLiNGAM leverages the non-Gaussianity assumption of the error terms in the LiNGAM model to obtain more efficient and robust estimation of the causal structure. TSLiNGAM is justified theoretically and is studied empirically in an extensive simulation study. It performs significantly better on heavy-tailed and skewed data and demonstrates a high small-sample efficiency. In addition, TSLiNGAM also shows better robustness properties as it is more resilient to contamination.
    Unifying Distributionally Robust Optimization via Optimal Transport Theory. (arXiv:2308.05414v1 [math.OC])
    In the past few years, there has been considerable interest in two prominent approaches for Distributionally Robust Optimization (DRO): Divergence-based and Wasserstein-based methods. The divergence approach models misspecification in terms of likelihood ratios, while the latter models it through a measure of distance or cost in actual outcomes. Building upon these advances, this paper introduces a novel approach that unifies these methods into a single framework based on optimal transport (OT) with conditional moment constraints. Our proposed approach, for example, makes it possible for optimal adversarial distributions to simultaneously perturb likelihood and outcomes, while producing an optimal (in an optimal transport sense) coupling between the baseline model and the adversarial model.Additionally, the paper investigates several duality results and presents tractable reformulations that enhance the practical applicability of this unified framework.
    Learning ground states of gapped quantum Hamiltonians with Kernel Methods. (arXiv:2303.08902v2 [quant-ph] UPDATED)
    Neural network approaches to approximate the ground state of quantum hamiltonians require the numerical solution of a highly nonlinear optimization problem. We introduce a statistical learning approach that makes the optimization trivial by using kernel methods. Our scheme is an approximate realization of the power method, where supervised learning is used to learn the next step of the power iteration. We show that the ground state properties of arbitrary gapped quantum hamiltonians can be reached with polynomial resources under the assumption that the supervised learning is efficient. Using kernel ridge regression, we provide numerical evidence that the learning assumption is verified by applying our scheme to find the ground states of several prototypical interacting many-body quantum systems, both in one and two dimensions, showing the flexibility of our approach.
    Exploring Deep Learning Approaches to Predict Person and Vehicle Trips: An Analysis of NHTS Data. (arXiv:2308.05665v1 [cs.AI])
    Modern transportation planning relies heavily on accurate predictions of person and vehicle trips. However, traditional planning models often fail to account for the intricacies and dynamics of travel behavior, leading to less-than-optimal accuracy in these predictions. This study explores the potential of deep learning techniques to transform the way we approach trip predictions, and ultimately, transportation planning. Utilizing a comprehensive dataset from the National Household Travel Survey (NHTS), we developed and trained a deep learning model for predicting person and vehicle trips. The proposed model leverages the vast amount of information in the NHTS data, capturing complex, non-linear relationships that were previously overlooked by traditional models. As a result, our deep learning model achieved an impressive accuracy of 98% for person trip prediction and 96% for vehicle trip estimation. This represents a significant improvement over the performances of traditional transportation planning models, thereby demonstrating the power of deep learning in this domain. The implications of this study extend beyond just more accurate predictions. By enhancing the accuracy and reliability of trip prediction models, planners can formulate more effective, data-driven transportation policies, infrastructure, and services. As such, our research underscores the need for the transportation planning field to embrace advanced techniques like deep learning. The detailed methodology, along with a thorough discussion of the results and their implications, are presented in the subsequent sections of this paper.
    Width and Depth Limits Commute in Residual Networks. (arXiv:2302.00453v2 [stat.ML] UPDATED)
    We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken. This explains why the standard infinite-width-then-depth approach provides practical insights even for networks with depth of the same order as width. We also demonstrate that the pre-activations, in this case, have Gaussian distributions which has direct applications in Bayesian deep learning. We conduct extensive simulations that show an excellent match with our theoretical findings.

  • Open

    [D] Running massive language models with Petals
    My observations and opinions on using Petals to run distributed LLMs, as a host and a user. https://yak.ventures/2023/08/11/distributed-llms-with-petals/ I'd be very interested to talk to anyone that is utilizing a distributed model for daily use or for an application and even more interested to talk to anyone running a model among friends or colleagues in the private mode. submitted by /u/Ruleryak [link] [comments]  ( 9 min )
    [R] Hi all, I am doing a research paper (high school) on ethics in AI art. I would greatly appreciate it if you took the time to fill in this survey. Thank you
    Here submitted by /u/TommZ5 [link] [comments]  ( 8 min )
    [D] Does RLHF increase the time horizon of models?
    RL techniques in general have the ability to increase the time horizon of models since future rewards impact the Q value or advantage of the current action. My understanding of RLHF is a reward model is trained based on human feedback, and then the LLM is optimized to maximize the reward from the reward model. Is this correct? If so, does the reward model care about future rewards? Does this impact the time horizon? submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [D] Confusion in the DL-based Keras Embedding and Dense Layer
    I am new to this field and feeling quite confused. I need to use a DL-based Keras Embeddings technique with the Dense layer for text classification (specifically, a Binary Classification problem), along with TF-IDF featurization as input for the Random Forest Algorithm. However, my confusion arises from the fact that the Keras Embedding Layer also serves as a featurization technique. Therefore, I'm uncertain whether this layer should be used as input for the Random Forest or it has the capability to classify text on its own. Second question is that what can be the reason to use Dense Layer and What it is exactly here. submitted by /u/ZahidAlee [link] [comments]  ( 9 min )
    Does this cover the basics/necessities of AI/ML [D] ?
    Hello. Trying to make a plan so I can chip away at stuff day-by-day over the next few months/year(s). I was wondering if I've classified everything in this diagram in the correct way or if I'm missing anything ? Reason I ask here is I'm not too sure if I'm missing anything obsecure or if I've misinterpreted anything ? Thank you ! https://preview.redd.it/dqbgfbcthjhb1.png?width=4299&format=png&auto=webp&s=1ceb45c3e4237f2204151bfed7bf45b54e4a5d68 submitted by /u/EngineerOwn6160 [link] [comments]  ( 9 min )
    [D] How Do Various Regularization Techniques Affect the Loss Surface?
    I'm currently working through "Understanding Deep Learning" by Simon J.D. Prince. On page 403, he makes the following statement about regularization: Another possible explanation for the ease with which models are trained is that regularization makes the loss surface flatter and more convex. From my understanding, L2 regularization (or weight decay) indeed adds a convex term λ∣∣w∣∣2 to the loss function, smoothing it out. Additionally, the Hessian matrix becomes more positive with the addition of the regularization term 2λI, giving the function a more convex characteristic. However, I'm puzzled as to how other regularization methods like Dropout, L1/Lasso, or Early Stopping might lead to a similarly flatter and more convex loss surface. Can anyone offer insights or explanations on this? submitted by /u/spontanurlaub [link] [comments]  ( 9 min )
    [D] Lessons from this years Neurips
    This years Neurips has been a rollercoaster for everyone involved. Petar Veličković says that in their AC batch 65% submitted no rebuttal or withdrew. https://twitter.com/PetarV_93/status/1689648854646575105 Xin Eric Wang says in their batch pre-rebuttal no papers had an avg score above an weak accept. https://twitter.com/xwang_lk/status/1686517898108674048 Will NeurIPS keep 25% acceptance rate? What do you think will happen to neurips in light of the above? Is this the end of big ML confs? ​ submitted by /u/SuchOccasion457 [link] [comments]  ( 9 min )
    [R] 3D Gaussian Splatting for Real-Time Radiance Field Rendering
    submitted by /u/individual_kex [link] [comments]  ( 8 min )
    [D] Implementing siamese network with MultipleNegativesRankingLoss in Keras/TF
    Hi! I have been trying to find a good guide of how best to implement a Sentence Transformers style model using Keras, but have not found anything :( I have managed to get something running, but I am not sure it is pretty and wanted to see if anyone know how to improve it or maybe has seen a nice implementation on the web? Here is my first draft https://gist.github.com/ydennisy/fec55fab84d107b72852ba2d2c2b61db submitted by /u/Suspicious_Dress_350 [link] [comments]  ( 9 min )
    [D] How does Lora save memory footprint for transformers?
    I can understand part of the statement if you are using Adam. Since the trainable params are much less, we are saving on optimizer states. However, even we are not actually updating the pretrained model, we still need to compute the graidients for backpropagation to the lower layer of the lora head. The memory usage of gradients would not decrease. Please correct me if I am wrong. submitted by /u/Chen806 [link] [comments]  ( 9 min )
    [D]How to Improve YOLO v8 model performance ?
    Hi everyone! I'm working on a model using YOLO v8x to detect regions on identity cards, but it struggles with identifying address regions. This issue seems to stem from insufficient data. Would it be advisable to incorporate additional data containing addresses(other documents instead of identity card) to enhance the model's accuracy in detecting address regions? submitted by /u/Ordinary_Run_2513 [link] [comments]  ( 9 min )
    [D] Why does using multiple gpus lead to slower performance?
    I read that using multiple gpus can improve inference performance, but I'm not sure why for my inference, its actually slower as I increase tensor_parallel_size. I know data transfer overhead and limited parallelism could be potential issues, are there ways to rectify this vllm = LLM( model="mosaicml/mpt-7b-instruct", trust_remote_code=True, dtype="float16", tensor_parallel_size=1, gpu_memory_utilization=.95, ) CPU times: user 3.66 s, sys: 262 ms, total: 3.93 s Wall time: 1.11 s vllm = LLM( model="mosaicml/mpt-7b-instruct", trust_remote_code=True, dtype="float16", tensor_parallel_size=2, gpu_memory_utilization=.95, ) CPU times: user 65.5 ms, sys: 32.2 ms, total: 97.7 ms Wall time: 1.27 s ​ submitted by /u/candyman54 [link] [comments]  ( 9 min )
    [D] How we evaluated LLMs in prod
    This is going to be a post about the challenges I faced while working with ChatGPT in my previous company and the things we did to overcome them over a 2+ month struggle. Check us out at www.twilix.io if anything below resonates with you and I hope you find some of it helpful. So to begin, in my previous company we invested a few months building a chatbot to help with user onboarding. At first everything was great, and we saw a 40% decrease in drop-off rates (which is significant given we were building a consumer facing app), but somehow over time this drop-off rate started creeping up again. Perplexed by the unexpected turn in metrics, management started to question the benefits of maintaining this chatbot and was skeptical that we were cherry picking examples to showcase its performance…  ( 10 min )
    [D] Train Stable Diffusion/Latent diffusion from scratch
    I'm currently in the process of developing a stable diffusion/latent diffusion model entirely from scratch. However, I'm a bit confused from the documentation of the original repositories (both from CompVis). My intention is to experiment with significantly smaller models and datasets while retaining the same architecture. Unfortunately, neither repository offers an official configuration for training the txt2img architecture.Through my exploration of the issues, I've observed that the training script provided by the latent diffusion repository does support txt2img (although an official configuration has not been made available yet). I'm curious if any of you might be familiar with better online resources or tutorials that can provide a clearer and more comprehensive understanding of the training process. submitted by /u/Arabum97 [link] [comments]  ( 9 min )
    [R] Tiny LVLM-eHub: Early Multimodal Experiments with Bard - OpenGVLab, Shanghai AI Laboratory 2023 - Encourages innovative strategies aimed at advancing multimodal techniques!
    Paper: https://github.com/OpenGVLab/Multi-Modality-Arena Github: https://github.com/OpenGVLab/Multi-Modality-Arena Abstract: Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated significant progress in tackling complex multimodal tasks. Among these cutting-edge developments, Google's Bard stands out for its remarkable multimodal capabilities, promoting comprehensive comprehension and reasoning across various domains. This work presents an early and holistic evaluation of LVLMs' multimodal abilities, with a particular focus on Bard, by proposing a lightweight variant of LVLM-eHub, named Tiny LVLM-eHub. In comparison to the vanilla version, Tiny LVLM-eHub possesses several appealing properties. Firstly, it provides a systematic assessment of six categories of multimodal capabilities, including visual perception, visual knowledge acquisition, visual reasoning, visual commonsense, object hallucination, and embodied intelligence, through quantitative evaluation of 42 standard text-related visual benchmarks. Secondly, it conducts an in-depth analysis of LVLMs' predictions using the ChatGPT Ensemble Evaluation (CEE), which leads to a robust and accurate evaluation and exhibits improved alignment with human evaluation compared to the word matching approach. Thirdly, it comprises a mere 2.1K image-text pairs, facilitating ease of use for practitioners to evaluate their own offline LVLMs. Through extensive experimental analysis, this study demonstrates that Bard outperforms previous LVLMs in most multimodal capabilities except object hallucination, to which Bard is still susceptible. Tiny LVLM-eHub serves as a baseline evaluation for various LVLMs and encourages innovative strategies aimed at advancing multimodal techniques. https://preview.redd.it/i6x6p5bloihb1.jpg?width=1485&format=pjpg&auto=webp&s=7e91fe184844278b0a7e14090ae9aaef54b29f37 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] GPT Sequence Classification explainability or interpretability
    I’m using GPT-2 for Sequence Classification. I want to understand the words or sequences that lead to the predictions. Can you point me towards any papers, repos or libraries? submitted by /u/how_the_turn_tablez [link] [comments]  ( 8 min )
    [D] What's is everyones outlook on AI swarms? Does it hold promise, or are larger systems going to be dominant?
    I've been researching AI swarms, and it seems to make more sense to have a lot of smaller models doing tasks separately. Thoughts? submitted by /u/deepengineai [link] [comments]  ( 8 min )
    [D] Is Hidden Size in current transformers an overkill?
    Hi, I have written a post discussion whether or not the hidden size in transformers is an overkill. TLDR; I show that an embedding size of 2048 is too much to represent just one token like `is` but rather, it can encode an average 8 tokens with up-to 16 tokens almost losslessly. I think if we can design more compute efficient transformers with some of the ideas that I explore in the post. Of course this is not a proper research with ablation studies and empirical analysis. But I would love to hear your thoughts on this topic. submitted by /u/NaxAlpha [link] [comments]  ( 9 min )
    [P] Using Machine Learning for Accesibility: Personal AI Shelf Inspector for Visually Impaired Persons
    Personal Shelf Inspector is an application that helps visually impaired people during their day-to-day shopping. The application is based on a simple neural network and was created as a part of the AI for Accesibility Hackathon in 2020 in Prague. We decided to build free tools that make shopping for visually impaired people more accessible. These tools can be implemented in any retail chain within the loyalty app or in-store. ​ https://preview.redd.it/ehoxec5l4hhb1.png?width=1072&format=png&auto=webp&s=c96a5dd326cbfc8dd9bde9d84d45167d172ea27d The Idea A visually impaired person only needs their smartphone to use our tools. Personal Shelf Inspector is a web application that reads the price and product name from a price tag. The algorithm selects the price tag closest to the centre of the photo and sends it to the model, which reads the price and product name on the price tag. Then it returns this information to the application, which appears as text on the screen. ​ https://preview.redd.it/e098rtcn4hhb1.png?width=845&format=png&auto=webp&s=73a216ebf987e476d52091717ac156ac56daa29b The voice-over built into the user's mobile phone reads this text aloud. The app also helps to read the banknote values and read from a live video. What could be other cool ideas and concepts to help making the world more accesible using AI and Machine Learning? Feel free to share comments and impressions in the comments submitted by /u/DataSentics [link] [comments]  ( 9 min )
    [Research] How InstructBLIP's authors do the datasets transformation to instruction data
    In "InstructBLIP" paper, authors say: "We transform 26 datasets into the instruction tuning format" in order to create a general-purpose vision language model via instruction tuning. However, they did not provide details on how they did this transformation. At a first glance, three ways come to mind: They use ChatGPT/GPT-4 to automatically transform them. They define and code rules to automatically transform them. They manually transform them (highly improbable) Someone knows the answer? Thank you so much submitted by /u/jrodriguezortega [link] [comments]  ( 9 min )
    [R] Open-Source Machine Learning in Computational Chemistry
    We wrote a perspective on open source machine learning in computational chemistry in JCIM_JCTC. It was an incredible amount of work and I hope readers will find it useful and educational. https://pubs.acs.org/doi/10.1021/acs.jcim.3c00643 If you need a preprint, you can find it on Researchgate. https://www.researchgate.net/publication/372470285_Open-Source_Machine_Learning_in_Computational_Chemistry submitted by /u/poorgenes [link] [comments]  ( 9 min )
    [R] Neural Wave Machines: Learning Spatiotemporally Structured Representations with Locally Coupled Oscillatory Recurrent Neural Networks
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [D]: Single Board Computer with accelerator as a hobby project
    Does anybody have a good recommendation for an SBC with AI accelerator (NPU) where I could attach a camera and train some YOLO models on the device itself for object recognition? submitted by /u/LM1117 [link] [comments]  ( 8 min )
  • Open

    Hi all, I am doing a research paper (high school) on ethics in AI art. I would greatly appreciate it if you took the time to fill in this survey. Thank you!
    Link to survey submitted by /u/TommZ5 [link] [comments]  ( 8 min )
    OpenAI CEO Sam Altman donates $200,000 to Biden campaign
    submitted by /u/micahdjt1221 [link] [comments]  ( 8 min )
    AI — weekly megathread!
    This week in AI - provided by aibrews.com feel free to follow their newsletter News and Insights Anthropic released a new version of Claude Instant, which offers faster performance at a lower price, with improvements in quote extraction, multilingual support, and question answering. It hallucinates less and is more resistant to jailbreaks [Details]. Stability AI announced the release of StableCode, its first LLM generative AI product for coding [Details]. Researchers present AudioLDM 2, a framework that utilizes the same learning method for speech, music, and sound effect generation [Details | GitHub]. Researchers from CMU and others conducted tests on 14 large language models and found that OpenAI’s ChatGPT and GPT-4 were the most left-wing libertarian, while Meta’s LlaMA was the m…  ( 10 min )
    Pika Labs: Tutorial for Beginners (Text-to-Video Platform)
    submitted by /u/SplitYOLO [link] [comments]  ( 8 min )
    Commercial for BBC Planet Earth used AI
    submitted by /u/Grindmaster_Flash [link] [comments]  ( 8 min )
    Medication Mix-up Incident Involving My Mother
    submitted by /u/Rightperson1 [link] [comments]  ( 8 min )
    Client project matching AI recommendations?
    At my company, we collaborate closely with top-level executives from Fortune 500 companies and other industry leaders, helping them identify and secure the right partners for crucial digital transformation initiatives. When these executives present us with their project specifics, budgets, obstacles, and schedules, we take charge of finding the right partners for their RFP process, enhancing the entire workflow for efficiency and effectiveness. Currently, I have a collection of RFP projects and I’m keen on leveraging AI to simplify the task of identifying potential partners to call. I provided ChatGPT with all of my various project details and would inquire, ‘Which of my client projects align well with X company, and what are the reasons?’ OR “Would X company align with any of my projects?” The AI started off well, but eventually became confused and started making mistakes. Are there any systems available that could assist me in this project matching process? submitted by /u/Ajkrouse [link] [comments]  ( 9 min )
    VQA Recommendations, anyone?
    Hi, what VQA platforms do you all have experience with? What would you think would be the most promising platform at the moment, and in the future? I've been playing around with Google Vertex AI (https://console.cloud.google.com/vertex-ai/generative/) but the current results are ... meh! 🤷‍♂️ Any other recommendations? submitted by /u/emc [link] [comments]  ( 8 min )
    One-Minute Daily AI News 8/11/2023
    A new AI algorithm has detected a potentially hazardous asteroid that had gone unnoticed by human observers, slated to fly by Earth. The algorithm, HelioLinc3D, was explicitly designed for the Vera Rubin Observatory currently under construction in Northern Chile.[1] The U.S. Defense Department has created a task force to evaluate and guide the application of generative artificial intelligence for national security purposes, amid an explosion of public interest in the technology.[2] China’s largest web and cloud providers (Alibaba, Baidu, ByteDance, and Tencent)are lining up to buy as many Nvidia GPUs as they can while they still can get their hands on them.[3] At Black Hat USA 2023, DARPA issued a call to top computer scientists, AI experts, software developers, and beyond to participate in the AI Cyber Challenge (AIxCC) – a two-year competition aimed at driving innovation at the nexus of AI and cybersecurity to create a new generation of cybersecurity tools.[4] Sources: [1] https://www.giantfreakinrobot.com/sci/ai-asteroids.html [2] https://www.c4isrnet.com/artificial-intelligence/2023/08/10/pentagon-establishes-task-force-lima-to-study-generative-ai-issues/ [3] https://www.theregister.com/2023/08/11/chinese_web_giants_nvidia/ [4] https://www.hstoday.us/industry/industry-news/darpa-ai-cyber-challenge-aims-to-secure-nations-most-critical-software/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    AI Agents Simulate a Town 🤯 Generative Agents: Interactive Simulacra of Human Behavior.
    submitted by /u/crua9 [link] [comments]  ( 8 min )
    An Extension LLM Model That Also Analyzes Page Text
    I have developed this chrome extension named Lupin which allows you to ask your question about your current tab directly to chatGPT by analysing the page's body. For instance, if you're looking into an Amazon product, you can ask your question about it directly to Lupin. https://chrome.google.com/webstore/detail/lupin/kdfaiheakopcdabhlcnbmfjffanaedgm?hl=en&authuser=0 Right now, this is an open-beta phase, so I am open to any feedback. I have improved some aspects based on the feedback I received but I want to improve as much as possible before going for version 1.1 If you wanna join me on this crusade and work together, DM me. Amor Fati, AAC submitted by /u/AttilaTheHappyHun [link] [comments]  ( 9 min )
    RVC AI samples examples
    Hello, is there anywhere I can find .wav files to see examples about how would be the ideal type of samples I should provide my AI so it learns a more wide register of my voice? I didn't manage to find anything like that Sorry if it's a newbie question submitted by /u/Callumpi [link] [comments]  ( 8 min )
  • Open

    Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Models to Unique Applications
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Challenges and solutions in Big Data management
    Big Data Management has become a pivotal part of modern business, influencing decisions, shaping strategies, and offering unparalleled insights. With the exponential growth of data from myriad sources, managing it effectively is more critical than ever. However, big data’s sheer volume, variety, and velocity present a unique set of challenges. These challenges range from integration… Read More »Challenges and solutions in Big Data management The post Challenges and solutions in Big Data management appeared first on Data Science Central.  ( 21 min )
    Pushing boundaries with Generative AI: How Program-aided Language model (PAL) enhances Large Language Models (LLMs) for superior AI performance
    Source: ArabianBusiness Takeaways Artificial Intelligence (AI) continues to evolve at a rapid pace, with groundbreaking strides in generative capabilities playing a critical role in defining this ever-evolving landscape. One such transformative leap is the advent of Program-Aided Language models (PAL), an innovative solution that revolutionizes how Language Learning Models (LLMs) function. This article delves into… Read More »Pushing boundaries with Generative AI: How Program-aided Language model (PAL) enhances Large Language Models (LLMs) for superior AI performance The post Pushing boundaries with Generative AI: How Program-aided Language model (PAL) enhances Large Language Models (LLMs) for superior AI performance appeared first on Data Science Central.  ( 22 min )
  • Open

    Any suggestions on how I can improve my vision based PPO algorithm
    I am planning to throw my algorithm into a pronto server which enable me to increase the number of parallel workers. Currently, I am going with 24 workers. I'd appreciate more suggestions. Here's the pastebin link with syntax highlighting. Here's my code - #Modified this code - https://github.com/DeepReinforcementLearning/DeepReinforcementLearningInAction/blob/master/Chapter%204/Ch4_book.ipynb #Also, modified this code - https://github.com/higgsfield/RL-Adventure-2/blob/master/1.actor-critic.ipynb # Also, modified this code - https://github.com/ericyangyu/PPO-for-Beginners/blob/9abd435771aa84764d8d0d1f737fa39118b74019/ppo.py#L151 # Got a lot of help from the subreddit - reinforcement_learning if __name__ == '__main__': import numpy as np import gymnasium as gym from gymnasium.wrappers im…  ( 11 min )
    🐑 Dreamer V3 in SheepRL 🐑
    Hi everyone, we finally ended our journey through Dreamer, and we released the last version, Dreamer V3 in SheepRL. Our implementation follows closely the author's one, and is very well documented, with a blog post to explain the details and differences between this version and Dreamer V2. Together with Dreamer, we also have Plan2Explore with Dreamer v1 and v2. Finally, we completed the integration with Diambra, so you can try your agents on new (funnier) benchmarks. Check it out and feel free to contribute. Every feedback is appreciated :) submitted by /u/TrottoDng [link] [comments]  ( 9 min )
    What's the difference between GVF and Options?
    Two cool concepts - General Value Functions & Options. Seem to be for the same purpose. ​ What are the differences between these 2 strategies, and what are the benefits of each? Thanks! submitted by /u/Cultural-Average3959 [link] [comments]  ( 8 min )
  • Open

    Amazon Translate enhances its custom terminology to improve translation accuracy and fluency
    Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. When you translate from one language to another, you want your machine translation to be accurate, fluent, and most importantly contextual. Domain-specific and language-specific customizable terminology is a key requirement for many government and commercial organizations. Custom terminology […]  ( 5 min )
    Zero-shot text classification with Amazon SageMaker JumpStart
    Natural language processing (NLP) is the field in machine learning (ML) concerned with giving computers the ability to understand text and spoken words in the same way as human beings can. Recently, state-of-the-art architectures like the transformer architecture are used to achieve near-human performance on NLP downstream tasks like text summarization, text classification, entity recognition, […]  ( 11 min )

  • Open

    THIS Is What Comes Next For AI - The Simulation | Interview with Fable Studio CEO - Edward Saatchi
    submitted by /u/Sonic_Improv [link] [comments]  ( 8 min )
    Please help me understand these files
    I'm working on a skin cancer detection app where I can upload a picture of a mole or other skin lesion and have it tell me if its cancerous and what type of cancer it is, and I downloaded the HAM10000 database for it which came with 5 CSV files. I kind of understand the metadata CSV file but the other 4 don't make sense to me. They have a bunch of numbers and either L or RGB at the end of the file names. Can someone help me make sense of these? submitted by /u/timing_snow [link] [comments]  ( 9 min )
    Images on the subject of AI.
    submitted by /u/Philipp [link] [comments]  ( 8 min )
    Nvidia unveils GH200 Superchips for 'most complex AI workloads'
    submitted by /u/intengineering [link] [comments]  ( 8 min )
  • Open

    [D] OpenAI API function calling
    How do you think OpenAI implemented the function calling feature, It seems like another contextual generation piece from the look of it but any interesting ideas and papers around this topic? submitted by /u/neuro_boogie [link] [comments]  ( 8 min )
    LLMs Challenges and Approaches Panel [N]
    ​ https://preview.redd.it/wl1gtcngnchb1.jpg?width=1500&format=pjpg&auto=webp&s=24e35d852603c6139fd67f79457ec593fbad99f7 If you're someone who's curious about or working with LLMs there's a cool panel discussion coming up: Comparing the pros and cons of using existing LLMs, prompt engineering, and fine-tuning on custom datasets for different enterprise use cases. Fine-Tuning LLMs: Exploring the advantages and challenges of fine-tuning LLMs on custom datasets to align with specific business objectives. Tools and platforms: Discussing the various tools and platforms to facilitate LLM implementation Overcoming Challenges: Addressing the challenges associated with adopting LLMs, including data privacy, creating high quality datasets, computational resources, ethical considerations, and the need for specialized expertise. Future Directions: Exploring emerging trends, advancements, and potential future applications of LLMs in the enterprise context. Here's the event info: https://www.eventbrite.com/e/large-language-models-for-enterprise-success-challenges-and-approaches-tickets-695089811337?aff=oddtdtcreator submitted by /u/UpstairsLeast7642 [link] [comments]  ( 9 min )
    [D] List of Awesome AI Agents like AutoGPT and BabyAGI / Many open-source Agents with code included!
    Github: https://github.com/e2b-dev/awesome-ai-agents and https://github.com/EmbraceAGI/Awesome-AGI submitted by /u/Singularian2501 [link] [comments]  ( 8 min )
    [D] 🎹 Record Labels are monetizing AI-created Music after trying to kill it.
    Google and Universal Music are discussing licensing artists' voices and melodies to develop AI-generated songs fans can create and pay for, seeking to get ahead of the controversial "deepfake" music trend. Though some stars oppose their work being mimicked, artists could opt-in to receive royalties in a model akin to how YouTube now pays for user-generated content. For Google, AI music would boost its generative AI offerings against competitors. But significant ethical hurdles around consent and IP must still be addressed in developing a legitimate AI music market. submitted by /u/Yavero [link] [comments]  ( 9 min )
    [d] transformers for video activity recognition?
    I am trying to work with the UCF crime dataset and want to use transformers for video activity recognition, Does anyone have pointers to example projects as to which ones are good starting points? submitted by /u/bluzkluz [link] [comments]  ( 8 min )
    [P] I ran Llama 2 on my Mac in < 5 mins
    So Llama 2 sounds awesome, but I really wanted to run it locally on my Macbook Pro instead of on a Linux box with an NVIDIA GPU. So I put the llama.cpp GGML models into the XetHub Llama 2 repo so I can use the power of Llama 2 locally. It now takes me 5 seconds to mount Llama 2 and it loads the GGML model almost instantly. Here’s how I did it: Create an account: Go to xethub.com and Sign In with GitHub Quick start: Go to xethub.com/explore/quickstart and follow the Install & Setup steps (xethub.com/explore/install) pip install pyxet for Python SDK and CLI Set up authentication: Create a Personal Access Token and then run the login command from a Terminal so your ~/.xetconfig is set up with your login token. Here’s the code to get Llama 2 up and running on your Mac laptop in a few …  ( 12 min )
    [R] Benchmarking g5.12xlarge (4xA10) vs 1xA100 inference performance running upstage_Llama-2-70b-instruct-v2 (4-bit & 8-bit)
    Hi Reddit folks, I wanted to share some benchmarking data I recently compiled running upstage_Llama-2-70b-instruct-v2 on two different hardware setups. If you'd like to see the spreadsheet with the raw data you can check out this link. Hardware Config #1: AWS g5.12xlarge - 4 x A10 w/ 96GB VRAM Hardware Config #2: Vultr - 1 x A100 w/ 80GB VRAM A few questions I wanted to answer: How does the inference speed (tokens/s) between these two configurations compare? How does the number of input tokens impact inference speed? How many input tokens can these machines handle before they start to hit OOM? How does 4-bit vs 8-bit quantization affect all of the above? Why this model? I chose upstage_Llama-2-70b-instruct-v2 because it's the current #1 performing OS model on HuggingFace's LLM…  ( 10 min )
    [R] Discovering Adaptable Symbolic Algorithms from Scratch - Google and MSU
    Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaption policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies. Paper: https://arxiv.org/abs/2307.16890 Video: https://youtu.be/sEFP1Hay4nE submitted by /u/VishDev [link] [comments]  ( 9 min )
    [R} On Hate Scaling Laws For Data-Swamps
    submitted by /u/VishDev [link] [comments]  ( 8 min )
    [R] Heat-assisted detection and ranging - Nature
    Machine perception uses advanced sensors to collect information about the surrounding scene for situational awareness. State-of-the-art machine perception using active sonar, radar, and LiDAR to enhance camera vision faces difficulties when the number of intelligent agents scales up. Exploiting omnipresent heat signals could be a new frontier for scalable perception. However, objects and their environment constantly emit and scatter thermal radiation, leading to textureless images famously known as the ‘ghosting effect’. Thermal vision thus has no specificity limited by information loss, whereas thermal ranging—crucial for navigation—has been elusive even when combined with artificial intelligence (AI). Here, we propose and experimentally demonstrate heat-assisted detection and ranging (HADAR) overcoming this open challenge of ghosting and benchmark it against AI-enhanced thermal sensing. HADAR not only sees texture and depth through the darkness as if it were day but also perceives decluttered physical attributes beyond RGB or thermal vision, paving the way to fully passive and physics-aware machine perception. We develop HADAR estimation theory and address its photonic shot-noise limits depicting information-theoretic bounds to HADAR-based AI performance. HADAR ranging at night beats thermal ranging and shows an accuracy comparable with RGB stereovision in daylight. Our automated HADAR thermography reaches the Cramér–Rao bound on temperature accuracy, beating existing thermography techniques. Our work leads to a disruptive technology that can accelerate the Fourth Industrial Revolution (Industry 4.0) with HADAR-based autonomous navigation and human–robot social interactions. Paper: https://www.nature.com/articles/s41586-023-06174-6 Video: https://youtu.be/WKrzmaixAC0 submitted by /u/VishDev [link] [comments]  ( 9 min )
    [D] Is everything just transformers now?
    I was watching this talk where they were showing that basically every task in machine learning has been replaced by the transformer architecture. For instance, where a convolution neural network might have been used for image recognition in the past, the predominant strategy now is just to use a transformer instead. How true is this? Is it worth learning any other architecture than transformers for current state of the art research? submitted by /u/Active-Confidence926 [link] [comments]  ( 9 min )
    [D] Is Latent ODE an imputation model?
    Hello, Is Latent ODE an imputation model? If so, how does it handle missing values in the case of irregular sampled time series data? Latent ODE - Latent ODEs for Irregularly-Sampled Time Series (https://arxiv.org/abs/1907.03907) submitted by /u/flaubart9 [link] [comments]  ( 8 min )
    [P] txtai 6.0 - the all-in-one embeddings database
    txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. This major release adds sparse, hybrid and subindexes to the embeddings interface. It also makes significant improvements to the LLM pipeline workflow. See links below for more. GitHub: https://github.com/neuml/txtai Release Notes: https://github.com/neuml/txtai/releases/tag/v6.0.0 Article: https://medium.com/neuml/whats-new-in-txtai-6-0-7d93eeedf804 submitted by /u/davidmezzetti [link] [comments]  ( 9 min )
    [D] Ideal embedding models for classifying news articles to topics, specified as sentences
    I’m looking to build functionality that would allow a user to specify topics to be notified about in the news, eg. “Tax law changes in New York”, and notify them of recently published news articles related to that topic. Would the ideal strategy be to find relating articles to topics, or topics relating to articles as they come in? What models would be ideal here? I’m fairly new to this, so any help would be appreciated. submitted by /u/ByteBuff [link] [comments]  ( 9 min )
    [D] Intermediate/Advanced AI/ML Bootcamps
    Long time listener, first time caller. I am looking to spearhead the impending transition of understanding AI/ML at my organization and am looking for community suggestions in courses and bootcamps that could provide a deeper knowledge for some of my work projects in the future. Specifically, I feel unequipped in how to properly test and validate models. I have a computer science background with strong skills in data analytics and programming. I’ve also taken several introductory courses at a high level for AI. Does anyone have suggestions or experiences for 1-4 week long bootcamps or intensive courses? I prefer in-person (anywhere in US) but would also consider live-online remote courses. Price is not a concern. Thanks in advance. submitted by /u/DungeonsGalore [link] [comments]  ( 9 min )
  • Open

    skrl with multiple discrete actions
    I'm new to RL, and I was trying to train an agent to move items in a 2D grid. The agent needs to output the row number, column number, and item index, and right now I'm modeling them as discrete actions. I am not sure what kind of agent to use to solve this problem. I tried PPO, but I'm not sure what the output of the policy module should be in this case. I'd be grateful for any help. submitted by /u/LostPigeon25 [link] [comments]  ( 9 min )
    Implement parallel training using the multiprocessing module.
    This project allows you to easily implement parallel training with the multiprocessing module. submitted by /u/NoteDancing [link] [comments]  ( 8 min )
  • Open

    Understanding the future of smart cities through data science
    Learn about the challenges of data privacy and security, and the potential of smart technologies in creating efficient, livable urban environments. The post Understanding the future of smart cities through data science appeared first on Data Science Central.  ( 20 min )
  • Open

    Microsoft at KDD 2023: Advancing health at the speed of AI
    This content was given as a keynote at the Workshop of Applied Data Science for Healthcare and covered during a tutorial at the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, a premier forum for advancement, education, and adoption of the discipline of knowledge discovering and data mining. Recent and noteworthy advancements in […] The post Microsoft at KDD 2023: Advancing health at the speed of AI appeared first on Microsoft Research.  ( 12 min )
  • Open

    Build a centralized monitoring and reporting solution for Amazon SageMaker using Amazon CloudWatch
    In this post, we present a cross-account observability dashboard that provides a centralized view for monitoring SageMaker user activities and resources across multiple accounts. It allows the end-users and cloud management team to efficiently monitor what ML workloads are running, view the status of these workloads, and trace back different account activities at certain points of time.  ( 12 min )
  • Open

    Creating a Traveling Salesman Tour of Texas with Mathematica
    A Traveling Salesman tour visits a list of destinations using the shortest path. There’s an obvious way to find the shortest path connecting N points: try all N! paths and see which one is shortest. Unfortunately, that might take a while. Texas has 254 counties, and so calculating a tour of Texas counties by brute […] Creating a Traveling Salesman Tour of Texas with Mathematica first appeared on John D. Cook.  ( 6 min )
  • Open

    Challenge Accepted: GeForce NOW Fires Up the Cloud With Ultimate Challenge and First Bethesda Games
    Rise and shine, it’s time to quake up — the GeForce NOW Ultimate KovaaK’s challenge kicks off at the QuakeCon gaming festival today, giving gamers everywhere the chance to play to their ultimate potential with ultra-high 240 frames per second streaming. On top of bragging rights, top scorers can win some sweet prizes — including Read article >  ( 7 min )

  • Open

    Confidence-Building Measures for Artificial Intelligence: Workshop proceedings
    No content preview  ( 2 min )

  • Open

    Frontier Model Forum
    We’re forming a new industry body to promote the safe and responsible development of frontier AI systems: advancing AI safety research, identifying best practices and standards, and facilitating information sharing among policymakers and industry.  ( 4 min )

  • Open

    Moving AI governance forward
    OpenAI and other leading labs reinforce AI safety, security and trustworthiness through voluntary commitments.  ( 5 min )

  • Open

    Custom instructions for ChatGPT
    We’re rolling out custom instructions to give you more control over how ChatGPT responds. Set your preferences, and ChatGPT will keep them in mind for all future conversations.  ( 6 min )

  • Open

    Partnership with American Journalism Project to support local news
    A new $5+ million partnership aims to explore ways the development of artificial intelligence (AI) can support a thriving, innovative local news field, and ensure local news organizations shape the future of this emerging technology.  ( 3 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-09-09T00:40:55.014Z osmosfeed 1.15.1